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1. Summary. In many cases an optimum or computationally convenient test 
of a simple hypothesis Hy against a simple alternative H,; may be given in the 
following form. Reject Hp if S, = jus X; S k, where X,, X2,---, X, are n 
independent observations of a chance variable X whose distribution depends 
on the true hypothesis and where k is some appropriate number. In particular 
the likelihood ratio test for fixed sample size can be reduced to this form. It is 
shown that with each test of the above form there is associated an index p. If 
pi and p, are the indices corresponding to two alternative tests e = log p;/log pe 
measures the relative efficiency of these tests in the following sense. For large 
samples, a sample of size n with the first test will give about the same proba- 
bilities of error as a sample of size en with the second test. 

To obtain the above result, use is made of the fact that P(S, < na) behaves 
roughly like m” where m is the minimum value assumed by the moment gen- 
erating function of X — a. 

It is shown that if Hp and H, specify probability distributions of X which are 
very close to each other, one may approximate p by assuming that X is nor- 
mally distributed. 


2. Introduction. The problem of the efficiency of a test is of relevance to statis- 
ticians who are faced with either of the following two problems. The first problem 
is that of the design of an experiment. The second problem is that of deciding 
which test combines computational feasibility and efficiency per observation. 
The measure of efficiency with which we shall deal is especially relevant to 
problems which involve large samples whose size is determined by the experi- 
menter. 

The motivation for the results of this paper may be seen by considering the 
following simple example. Suppose that under the hypothesis H; , 


(2.1) P(X = 1) = p;, 
P(X =0) =1-— pi, t= 0, 1, Pi > Po. 


Then the likelihood ratio test reduces to that of rejecting Hy if S, = > me Xj; 
exceeds some number k. If n = 400, pp = .4, p: = .5, and k = 180, one may re- 
liably proceed to compute the probabilities of error by using the normal approx- 
imation to the distribution of S,. On the other hand, if n is very large, (say, 
1,000,000) the difference between the means of S, under Hp and H, is so large 


1 This paper was prepared with the support of the Office of Naval Research. 
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compared to the standard deviation of S, (ratio of 200) that the probabilities 
to be computed correspond to the extreme tails of the distributions of S, and 
the normal approximation is inapplicable. We note that this objection would 
not be serious for n = 1,000,000 if po were very close to p; (say, po = .499) for 
then 


(npr — npo)/V npiq. - V nlp - po) /V pig = 2. 


This situation immediately gives rise to the question of what is the behaviour 
of the probability distribution of S, in the tails of its distribution. This question 
was treated by H. Cramér [1} and is considered in Section 3 where Theorem 1 
states that if a S E(X), P(S, S na) is roughly like m” where m is the minimum 
value assumed by the moment generating function of XY — a. In Section 4 this 
result is applied to obtain a theorem which states the following result: If k is 
selected to minimize 8 + da where J is some given positive number and a = 
P(S, > k| Ho) and 8 = P(S, S k| H;,) are the probabilities of error, the mini- 
mum value of 8 + Aa behaves roughly like p", where p does not depend on X. 
Now the notion of efficiency is immediately suggested by the equation 


(2.2) pi’ = p2’. 


We may note that in the above example, one may be justified in using the 
normal approximation to the distribution of S, for relatively large n if pi — po 
is small. This tends to suggest that, if the hypotheses Hp and H, are very “‘close”’ 
to each other, p may be approximated by assuming X to be normally distributed. 
This conjecture is in fact borne out by the theorems of Section 5. 


3. The distribution of S,, in the tails. In this section we shall discuss the distri- 
bution in the tails of the sum of n independent observations on a chance variable 
X. Excellent results on this problem were obtained by H. Cramér [1] under the 
conditions that the moment generating function M(t) of X exists (finite) for 
some interval —A < t < A, and that the cumulative distribution function of 
the chance variables have an absolutely continuous component. This latter con- 
dition is not satisfied by discrete distributions. This condition was imposed in 
order to apply a bound on the error of the normal approximation to the dis- 
tribution of a sum of chance variables. C. G. Esseen [2] obtained this bound 
using only the (finite) existence of third order moments. For the case in which 
we are interested (i.e., P(S, < na)), the former condition may also be relaxed 
so that M(t) exists (finite) for -A <t S$ Oifa < E(X). 

Since the results of Cramér are extremely more powerful that we require 
here and the (finite) existence of third order moments is not necessary for the 
results that we desire, we shall state and briefly outline a proof of Theorem 1. 
Before doing this we shall first formally state some notation and lemmas which 
we shall use throughout this paper. These lemmas state known results which are 


rather obvious, depending mainly on Lebesgue’s Theorem on integration of 
monotone sequences [3]. 
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Notation 1. S, is the sum of n independent observations X,, X2, +--+, X, on 
a chance variable X with moment generating function M(t) = E(e'*) and cumula- 
tive distribution function F(x) = P(X 3S 2x). Let 


(3.1) m(a) = inf E(e“*) = inf &“‘M(t) 


(infimum with respect to real values of t). 
Unless otherwise specified we shall say that an expectation exists if it is + © or 
if itis — «©. We shall say that E(g(X)) fails to exist if both 


/ g(x) dF(x) = —« and [ g(x) dF(x) = +. 
o(z) <0 o(z)>0 


We shall denote by f(~) the limit of f(x) as x approaches ~. 
LemMA 1. M(t) attains its minimum value m(0). This value is attained for 
finite t unless P(X > 0) = O or P(X < 0) = 0. Jn that event m(0) = P(X = 0). 
Lemma 2. If P(X S 0) > O and P(X 2 0) > O, then m(0) > 0. 
Lemma 8. For all t in the interior of the interval of finite existence of M(t) 


(3.2) on ie / ze™ dF (2) 
and 
(3.3) — -/ re dF(x) = 0. 


oe = 0 if and only if P(X = 0) = 1. 


Lemma 4. Jf w(t), w(t), --+ , Un(t), +--+ is @ nondecreasing sequence of func- 
tions continuous in the closed interval {a, b}, 


Cc 
Furthermore, 


li inf Up, = inf [li n(t)]. 
(3.4) im [fin iu (t)] inf [lim w,(2)] 


n—ecasts astsb nw 


This statement applies to the extended case where u,(#) may take on the value 
co and to the case where a = —© providing u,(— ©) = lim u,(é). 
t{—+—o 
THEOREM 1. Jf E(X) > —~ anda S E(X), then 


(3.5) P(S, S na) S [m(a)]”. 
If E(X) < + anda 2 E(X), then 


(3.6) P(S, 2 na) = [m(a)]”. 
If 0 < € < m(a) (E(X) need not exist), 
(3.7) lim (m(a) — 6)” _ 


n—00 P(S, Ss na) 
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and 


(3.8) lim (m(a) — 6)” = 


——— = (). 
ae F(S. S 1a) 


Proor. We present here a brief sketch of a proof of Theorem 1. We note first 
that it suffices to prove (3.5) fora = 0 S E(X), and (3.7) fora = 0. Using the 
extended Tchebycheff inequality [4] 


(3.9) E(e“") = [M(t)|" = P{S, S 0] fort < 0. 
Hence 


(3.10) PIS, < 0] s [inf M(d]". 


tso0 


But a S F(X) implies that 


(3.11) inf M(t) = inf M(t) = m(0). 
ts 

To establish equation (3.7) we note that it is sufficient to treat the case a = 0. 
Then we see that the cases where P(X > 0) = 0 and where P(X < 0) = O are 
trivial. Hereafter we shall assume that P(X > 0) > O and P(X < 0) > 0. 

We shall now treat the discrete (but not necessarily bounded) case where 
P(X = x;) = pi > 0,7 = 1, 2, --- . Given e > 0, one may select an integer r 
so that 


‘ 


(3.12) min (71, 2%, °°: ,@%,) <0 < max (x, % 


ry 
and 


r \ oo \ 


(3.13) inf {  e*p;> > inf | eps — <. 


9 
\ tl i=! - 


In fact, let 


(3.14) m* = > ep; = inf {>> e**p,>. 
\i=l / 


i=l 


For this discrete case it now suffices to show that for sufficiently large n there 
are r positive integers mn, , m2, ---* , nm, such that 


(3.15) > ni = 2, 
i=l 


: nx; S 0, 


t=] 
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For large mn, m2, --- , m- (not necessarily integers) Stirling’s Formula gives us 


(3.18) P(m, M2, +++, Mr) = me (=). ai 


im] nrie 
Now 


(3.19) ln, He, 0d TE (se*)" 


t=] 


can be shown by the method of Lagrange multipliers to attain a maximum of 
(m*)” subject to the restrictions 


(3.20) > n; = 2, 


i=l 


(3.21) > n,x; = 0, 


t=1 
(3.22 n; > 0, 
and the maximizing values of m,, m2, --* , %, are 
(3.23) n>? = np. **/m*. 
Assuming that z, S 2; fori S r, we let 


(3.24) ni? = [n°], 


(3.25) ni? =n— : ass 


(n\” | represents the greatest integer less than or equal to n;’. Then for 


large n, the n;’ are large positive integers adding up to n for which 


1 
Die nizi S 


where 


and 
(3.26) 
and 


, 1 Q (1 
Pint (te? = Fe 7 
nei? 


which was to be shown. 
We shall now treat the general case. Let 
ri ' . s-1 
xX" = if — 
(3.28) $ ’ 
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If S;’ represents the sum of the X“” for n independent observations 
(3.29) P(S, < 0) = P(S“? < 0) 


and 


(3.30) M(t) = E(e*“’) = &'""M(Q). 


Since P(X > 0) > Oand P(X < 0) > 0, M(é) attains its minimum for a finite 


value of ¢ and hence there is an s sufficiently large so that 
(3.31) inf {M“’(t)} = inf {M(O} — de. 


Our theorem follows from the result for the discrete case and equation (3.29). 


4. The measure of asymptotic efficiency. In this section some elementary 
monotonicity and continuity properties of m(a) are obtained. These properties 
are then used to obtain an index p for a test. This index has the property that 
if k is chosen to minimize 
(4.1) 8+ ra = P[S, S k| Hi] + AP(S, > k | Abd, 
the minimum value of 8 + Aa is roughly about p”. Furthermore, p is independent 
of \. From this it is easily seen that if p; and p» are the indices of two tests, 
log p:/log pz is an appropriate measure of the relative efficiency of these tests. 

Noration 2. Let a, be defined by 
(4.2) P(X <a.) =0 


and 

(4.3) P(X <a+<¢«6>0 for every « > 0. 
Let t(a) be given by 

(4.4) m(a) = e&  M{t(a)]. 


Note that Lemma 1 implies that ¢(a) exists and that Lemma 3 implies that 
t(a) is unique unless P(X = a) = 1. 

Lemna 5. If E(X) > —<« and M(t) = ~ fort < 0, then t(a) = O and m(a) = 
1 fora S E(X). 

Proor. From the proof of (3.5), it follows that t(a) S 0 fora S E(X). Lemma 
5 follows immediately. 

Lemma 6. If M(t) < « for some t < 0, then E(x) > — ~. Furthermore, 


(4.5) m(a) = 0, a<Q, 
(4.6) m(a,.) = P(X = a,), 

and 

(4.7) m[E(X)] = 1. 


Also, m(a) is continuous and strictly monotone increasing fora, S a S E(X). 
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Proor. That £(X) exists (finite) or is + © is apparent. For a < a,andt < 0, 


(4.8) l ef dF(z) < ef, 


Hence m(a) = 0. Now we note that if a, is finite 


(4.9) P(X on a) < [ ee dF (z), 


(4.10) lim [ e'*-*? dF(z) = P(X = a.) 


t—-« 


and hence m(a,) = P(X = a,). Ifa, = —@, 
(4.11) lim / e-9 dF(x) = 0, 


so that lim m(a) = 0. Now we note that 


a--2 


(4.12) lim [e* M(t)] = [ (x — a) dF(z). 

Since (d’/dt’)[e~*'M(t)] > 0, unless P(X = a) = 1 in which case Lemma 6 is 
valid, it follows that t(a) < 0 fora < E(X) and t(a) = O fora = E(X). Hence 
m[E(X)| = 1, and m(a) < 1 fora < E(X). 

We shall now show that for a, < a < E(X), t(a) is finite and a non-decreasing 
function of a, while m(a) is strictly increasing for a, S a < E(X). The finiteness 
of ¢(a) follows from 


(4.13) l ef dF(z) = P(X <a — ee 
for t < 0, « > 0. Therefore, 


(4.14) mila = [ eta) AP) < m(a), 


=> 


and furthermore 
(4.15) [ et) GF(z) > / et@ ee) Gr(z) 


for t’ > t(a),h > 0. 

It suffices now to show that m(a) is continuous on the right for a < E(X) 
and continuous on the left for a. < a S$ E(X). Given a < E(X) and e > 0, 
there is a finite t’ so that 


(4.16) p [ e' = dF(x) S m(a) + «, 
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lim m(a + h) S lim [ e 2" GF(zx) 


h—»0+ h—+0+ 


< m(a) + «. 


Given a so that a, < a S E(X), there is an h; > 0 so that a. < a — h,. For 


i) 


t(a — hi) St S ta), [ e'*"™ dF(x) converges uniformly to | e'*” dF (zx) 


<2 


as h — 0+. Hence lim m(a — h) 2 m(a). 
h—- 0+ 


Notation 3. Hy and H, are two hypotheses which specify the distribution of X 
so that w = E(X | Ho) S mw = E(X | H,). For each value of a we consider a test 
which consists of rejecting Hy if S, > na. Let a = P(S, > na| Ho), 8B = 
P(S, S na | H;) and Xd be any (finite) positive number given in advance. Let 


(4.18) p(a) = max [m(a), m(a)], 

where 

(4.19) m(a) = inf E(e“*~ | H,), 
Furthermore, let us define the index of the test determined by X by 


(4.20) p= inf p(a). 

Se@sui 
We note that in the event that it is desired to use a test where we reject Ho if 
S, 2 na, one may replace X by —X and interchange Hy and H, . The value of 
p is not affected by this transformation. 

The customary procedure of minimizing @ for a fixed value of a does not seem 
very appropriate when the sample size approaches infinity. We shall instead 
deal with test which minimize 8 + da for some fixed value of 4,0 <A < @. 
Such a test is a “Bayes Solution” corresponding to some a priori probability of 
H, which depends on \. The study of Bayes Solutions may here be justified on 
grounds not involving any belief in a priori probabilities. In particular, if it is 
desired to minimize some function F(a, 8) for large samples and neither dF /da 
nor 0F/d8 vanish at a = 8 = O, the minimizing test will correspond to a 
dF (0,0) /aF (0, 0) 

a8 da 
THEOREM 2. Given ¢ and X, « > Oand0 <X < ~, then 


Bayes Solution where Xd is close to 


(4.21) lim finf (8 + A\a)/(p + €)"\ = 0 


n—-2 


) 
andif0 <ée<p 


(4.22 lim {inf (8 + da)/(p — £)"\ = ~. 


ne | 


Proor. There is a value dp of a so that p(a) S p + «/2. Applying Theorem 1, 
eqilation (4.21) follows immediately. Now we note that 
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(4.23) = P(S, S na| H;) 2 P(S, S na, | A), 


(4.24) inf (8 + Aa) = P(S, S na; | H,) 


aaa), ve 


(4.25) P(S, > na| Ho) = P(S, = na; |Hp), 
(4.26) inf (8 + Aa) = AP(S, = na; | Hp). 


a<al . 
Theorem 1 gives us our result as soon as we show the existence of an a in 
[uo , #1] SO that both mo(a2) = p and m,(a2) = p. To this end we consider 


(4.27) F = {a:m(a) 2 p, w Sa S wy}. 


The set F is not empty because m,(u:) = 1 2 p. Let a, = g. l. b. F. By contin- 
uity on the right m,(az) 2 p. Also m,(a) < p for a < a,. Hence m(a) 2 p if 
uo S a < ay. Since m(a) is continuous on the left for a > yo, mo(a2) = p if 
a2 > wo. Furthermore, if a2 = uo, Mo(a2) = 1 2 p. 

Notation 4. Let p,; and p2 represent the indices of two tests T,; and T2, respec- 
tively. We define the asymptotic relative efficiency of T, to Tz by 


(4.28) e = log p:/log p2, 
where e is undefined if p: = p2 = 1. For test T; , n; is the sample size and 
(4.29) vy: = inf (8 + Aa) 


is a function of n; and X. 
The appropriateness of the use of e as a measure of efficiency derives from the 
following theorem, which is an immediate consequence of Theorem 2. 
in : Ne : : 
TneoreM 3. Jf lim — < e(>e), then lim Me & (=0). 
n1,.ng-720 ny ni, ng-~2 Y2 


Note that e does not depend on X. 


5. Some examples. In this section we shall determine the behaviour of m(a) 
and p for a few simple examples. 

ExampPL_e 1. Let X be normally distributed with mean y; and variance a; 
under hypothesis H;, i = 0, 1(uo < m1). Then 


(5.1) eM; (t) an eitimo tthe ste 


4(uj—a) 2/¢ 2 
’ 


m;(a) = e 


(5.3) p=p (sue + sve) mee Hi H0)/(orte0))? | 
1 + oo 


Of course this index applies toa test which is not the likelihood ratio test unless 
o; = o). The computational problem in obtaining the index of the likelihood 


ratio test is considerable. However, the results in Section 6 may be easily ap- 
plied to the likelihood ratio test if 4; — yo and o, — oo are small. 
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Examp.e 2. Let X/oj have a chi-square distribution with r degrees of freedom 
under hypothesis H;, i = 0, 1(¢3 < oj). Then 


(5.4) eM; (t) = & (1 — 2021)” 


(5.5) log m;(a) = —} E + r log rv - r|, 
o 


9 
: 


(5.6) log p = —4rlé — 1 — log 4], 


where 
(5.7) 6 = (log r)/(r — 1), 
(5.8) T= o3/a2 5 
Note that as r approaches 1, log p =~ —r(r — 1)*/16. 
EXAMPLE 3. Let X have the binomial distribution so that 
(5.9) P(X =j|H,) = (5)piq’y’, 
q=1-—p;, ~=0,1, 7 =0,1,--- ,r(po < pr). 


Then 
(5.10) e'Mi(t) = & (pe + qi)’, 
(5.11) log m,(a) = (r — a) log [rqo/(r — a)] + a log [rpo/al, 


(5.12) log p = r{(1 — c) log [qo/(1 — c)] + ¢ log [po/c]}, 


where 


a log (qo/q) 
5.13) C10 eg eee Oe SO hh. 
6 log (qo/41) + log (pi/po) ’” ” 


Note that as p; approaches p) , log p = —r(p: — po)"/Spoqo - 


6. Normality approximation. In this section we shall develop some results con- 
cerning the conjecture made in the introduction, that if the hypotheses H» and 
H, are very close to one another one may approximate p by assuming that X 
is normally distributed. To this end we shall first investigate more closely the 
behaviour of m(a) and ¢(a). 

Noration 5. Let N(t) = E(Xe'*) and P(t) = E(X’e'*). Let 

lo = gib{i: M(t) < ~} 


and if to < 0 let 


(6.1) a = inf N(t)/M(i). 


te<t<0 


Note that if E(X) > — x, a, < E(X) except in the case where P(X = 
Furthermore, if a, > — =x, thenh = —@. 
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Lemma 7. If M(t) < © for somet < 0, anda, < E(X), then fora <a< E(X) 
(6.2) a = N(t(a))/M(t(a)), 


(6.3) deg mo] _ K), 
da 


dt(a) _ M(t)’ 
da M(t)P(t) — N(t)?| tana ae 


If in addition 4 = E(X) and o® = E[(X — u)’] are finite, then (6.2), (6.3), and 
(6.4) hold for a, < a S E(X), giving 


(6.4) 





(6.5) $ [log m(a)] = 0 
da 


o=y 
and 
au(a) 


ae i ae 
(6.6) oe l/o. 
Proor. Suppose that 4: < ¢ < 0. Using Lemma 3, there is a unique a so that 


t = t(a) and this value of a is obtained by 


d 


(6.7) dt 


fe" M(t)| = [ (x — a)e‘*” dF(x) = 0 


and is given by 
(6.8) a = N(t)/M(t). 
Considering a as a function of t we may differentiate 


da_ M(t)P(t) — N(t)* 

dt —~é«CMAG@)?! , 

Applying Schwarz’ Inequality, the numerator is at least zero. It can vanish 
only if Xe'*’* and e'*” are proportional with probability one. This can occur 
only if P[X = a,] = 1. This case is excluded by the hypothesis a, < E(X). 
Hence (da/dt) > 0. Furthermore, as t — 0, a — E(X). Hence as ¢t varies over 
(t , 0) a ranges continuously (and monotonically) over (a, E(X)). Equations 
(6.2) and (6.4) are immediately valid. “quation (6.3) is obtained by differen- 
tiating with respect to a, m(a) = E(e” ~~). Equations (6.5) and (6.6) follow 
from the Lebesque Convergence Theorem [3] and the fact that if f(x) is con- 
tinuous at x = a, and f’(r) ~ bas x — a, then f’(a) = b. 

If vy) and »; are any two probability measures defined on the same Borel Field, 
we may introduce the measure v = (v + »)/2. A consequence of the Radon- 
Nikodym Theorem [3] is the existence of two densities f) and f; (unique except 
possibly on a set of y measure zero) so that 


(6.9) 


(6.10) vs(E) = | fe(x) dv (@) i = 0,1. 
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Hence, except on a set of vy measure zero, at least one of fo(x) and f,(x) are non- 

zero, and the log of the likelihood may be defined by log fi(x) — log fo(zx). 
Noration 6. The outcome of an experiment is denoted by Y and has a proba- 

bility distribution given by equation (6.10) under hypothesis H;. When an inte- 

gration sign is unaccompanied by a region of integration it is to be understood that 

the region is the set of all possible values of Y. We shall deal with a chance variable 

X which is a function of Y. In particular the log of the likelihood ratio is defined 

by log fi(Y) — log fo( Y). 

(6.11) M(t) = E(e™ | H)), 

(6.12) Ni(t) = E(Xe™ | Hi), 

(6.13) P(t) = E(X’e™ | H)). 

We use m,(a) and t,(a) to represent the functions m(a) and t(a) under hypothesis 

H;. 


Lemma 8. If X is the log of the likelihood ratio, X # 0, and X is finite with 
probability one, then 


(6.14) M,(t) = M(t + 1), Ni(t) = No(t + 1). 
As a varies from yo to uw , to(a) varies continuously from 0 to 1 and 
(6.15) to(a) = (a) + 1, 


(6.16) p= inf M(t). 
0<t<l 


Proor. We note that 


> 4 ( re fi (x) ‘fi (x) ne ; 
(6.17) M, i= [42] fo(x) 2°” dy(x) = My (t a 1) 


and 


(6.18) Ni (t) = [ v0 EA BARA dv(z) = No(t + 1). 
It is evident that 

(6.19) M,(0) = M,(1) = M,(0) = M,(—1) = 1. 

It follows that .V,(t) is finite for 0 < ¢t < 1, that uw > 0, wo < 0, and 
(6.20) lim N,(t) = Ni (0) = mw, 


t—0— 


(6.21) lim N, (t) = Ni(- 1) = Mo- 


t——1+ 


Applying Lemma 7, we find that as a varies from yo to w:, &4(a) varies con- 
tinuously and (strictly) monotonically from —1 to 0. Similarly, f(a) varies from 
0 to 1. Applying equations (6.2), (6.17), and (6.18) 

Nolti(a) + 1] _ No[to(a)) 
Mo[ti(a) + 1] ~~ Mo[to(a)) 
Equation (6.15) follows. 


(6.22) = afor w <a < mw. 
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Since e*" Mo(to(a)) and e~** M,(t(a)) are both equal and continuous at 
a = 0, the monotonicity properties of Lemma 5 show that 


(6.23) p = Mo(to(0)) = m (0) = m (0) = inf M(t), 
0<¢t<l 


(6.24) aia Sal / (f. (x)]! [fo (a)]*-* dy (x). 
0<t<l 


We are interested in likelihood ratio tests for which ui + 0} is very small. 
The following theorem applies to certain classes of tests. In this theorem we are 
interested in classes of tests where the log of the likelihood ratio has finite means. 
Hence the restriction of Lemma 8 that X is finite with probability one is auto- 
matically satisfied. However the case where X may assume the values + © or 
—« with positive probability is of some interest. For this case the above sort 
of reasoning applies except that all integrals must be taken over the set, 


G = {xr:—@ < log f(x) — log fi(z) < &}. 


After the necessary modifications are made, it is seen that (6.24) is valid in 
general. 
THEOREM 4. Jf, for a class C of likelihood ratio tests, 


(6.25) Mo(t) = 1 + wot + (us + o0)°/2 + ous t+ os), O<t <1, 
Mi(t) = 1 + mit + (ui + of)f'/2 + o(us +09), —-1<t <9, 
then 
(6.26) o/2 + 0(9), 
—0/2 + o(05), 


= 0) + 0(05), 


ss 
p=e® + o(a5), 


4 
a= es! + (uo), 


2 
—41(u1—n9)/ (01 +09))? Mi — fo 
p=e +o — 
01 + a ; 


Proor. Part 1. 


(6.28) M(t — 1) = M,(0), 


9 


2 2 
(6.29) ms a= ] + t{(uo — wm) + (ui + oD) 


t 
+ 5 [uo + 90 — wi — oi] = o(us + 99). 
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Hence 
2 2 2 2 2 2 
Mi + oy = wo + 09 + O(uo + 29), 
ui = 03/2 + o(us + 29), 
(6.30) . ot 
Mo = 03/2 + O(uo + 20), 


o; = 05 + 0(us + a7). 
Equations 6.26 follow immediately. 
Part 2. By Lemma 8,p = inf M,(t). Minimizing the quadratic approxima- 
0<t<1 
tion we obtain 
a 
2(uo + o 
Applying the results of Part 1, Part 2 follows immediately. 
We may also be interested in tests of the form 


(6.31) p=1- 5) + o(uz + 09). 
0 


Sn = aa X; <s k, 
j=1 
where X is a less efficient statistic than the log of the likelihood ratio. Here 
again, given a class of tests, we may investigate the behaviour of p as the hy- 
potheses get “close” together. For some such classes we state the following 
theorem. 
TueoreM 5. If, for a class C* of tests, 


2 dt;(a) 
Ci — 


(6.31) — ? 1 + o(1) 
as w) — wo — 0 for w <a < m,t = 0,1, then 
(6.32) ok fe (a=*). 
. . 1 + a 
PROOF. 
=—(g — ‘74 1 
log m(a) = tae | 3 +o (5)| 
a 0 0, 
(6.33) ‘ 
pe eT 4 1 
log m,(a) = —@— mw) = uy) E +o (*)| 
2 o1 \7%1 


Equating the main terms, one obtains 


. lfm — a) (2 _ " 
6.34 logp = — m... pitcineeies 
(6.34) aie ‘(mo eA a1 — 9% 


It may be seen that the corresponding value of a satisfies a = (oy + cou)/ 
(o; + oo). Finally we note that equation (6.4) is useful in checking the applica- 
bility of Theorem 5. 
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7. Measures of information and divergence. In Section 4 the measure of effi- 
ciency e, was defined so that n observations for one test is equivalent to en obser- 
vations for the second test (equivalent from the point of view of the criterion we 


used). It is evident that it would have been appropriate to use the following 
equation 


(7.1) 1(X) = —log p 


to indicate that —log p may be used as a measure of the information per obser- 
vation for a test based on sums of observations on X. (Here X denotes the two 
specified chance variables associated with Hy and H, , respectively.) In addition 
we may have written 


(7.2) D(Y) = tog | inf J tacenrtvcen* aot | 
0<t<1l 


to indicate that —log p for the likelihood ratio test may be used as a measure 
of the divergence between the two distributions associated with Y. Let (Y;, Y2) 
represent an observation consisting of independent observations on Y; and Y; 
respectively. Then it is easy to see from equation (6.24) that 


(7.3) D(Y,, Y2) S D(¥1) + D(Y2) 
and 
(7.4) D(Y, Y) = 2D(Y) 


A measure of divergence used by Kullback and Leibler [5, 6], yields equality in 
the relation (7.3). The measure (7.2) and that used by Kullback and Leibler 
are basically two different functionals on the curve relating the type 1 and 
type 2 errors for likelihood ratio tests. 
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ORTHOGONAL ARRAYS OF STRENGTH TWO AND THREE 
By R. C. Bose ann K. A. Bus# 


University of North Carolina and University of Illinois 


1. Summary. Orthogonal arrays can be regarded as natural generalizations of 
orthogonal Latin squares, and are useful in various problems of experimental 
design. In this paper the known upper bounds for the maximum possible number 
of constraints for arrays of strength 2 and 3 have been improved, and certain 
methods for constructing these arrays have been given. 


2. Introduction. A k K N matrix A, with entries from a set = of s = 2 ele- 
ments, is called an orthogonal array of strength ¢, size N, k constraints and s 
levels if each ¢ X N submatrix of A contains all possible ¢ X 1 column vectors 
with the same frequency \. The array may be denoted by (N, k, s, t). The num- 
ber \ may be called the index of the array. Clearly N = As‘. 

The set = will for convenience be taken as the set of integers 0, 1,2, --- ,s — 1. 
For example the orthogonal array (18, 7, 3, 2) with index 2 is given below. It is 
easy to verify that in any 2 X 18 submatrix, each of the column vectors (0, 0), 
(0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2) occurs twice. 


01201;% Lae: lee: % 
0120 20% 1 

01% é 12: 120 

0 43 2120 
01% 0 : 20012201 
012; 1120120201012 
OGO00O0O0TLIII! 


If the orthogonal array A is of strength ¢ so is any subarray of k’ rows (con- 
straints) if k’ < k. Hence the non-existence of (As‘, k’, s, t) automatically im- 
plies the non-existence of (As‘, k, s, ¢) if k > k’. Again if A is of strength ¢, it is 
also of strength ?¢’ for all ’ S t. 

The optimum multifactorial designs considered by Plackett and Burman 
[1] are essentially orthogonal arrays of strength 2. They have shown that the 
maximum number of constraints k for an orthogonal array of size Xs’, s levels 
and strength 2, satisfies the inequality 


(2.1) ks E | 


where [x] is largest possible integer not exceeding x. The square bracket is used 
in this sense throughout this paper. 
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The existence of the orthogonal array (s°, k, s, 2) is combinatorially equiva- 
lent to the existence of a set of k — 2 mutually orthogonal s x s Latin squares 
(such a set is usually said to have k constraints, represented by rows, columns 
and the k — 2 squares). The inequality (2.1) for the case \ = 1 states the well 
known fact that the maximum number of mutually orthogonal s X s Latin 
squares cannot exceed s — 1. 

Again for the special case s = 2, (2.1) gives k S 4X — 1. Plackett and Bur- 
man have actually constructed orthogonal arrays (4A, 44 — 1, 2, 2) for all 
values of X S 25, except for A = 23. They also give a number of arrays of strength 
2 for other values of s, and establish a connection between orthogonal arrays 
and affine resolvable balanced incomplete block designs [2], and between orthog- 
onal arrays and partially balanced designs [3]. 

Rao [4] studies hypercubes of strength d, which are orthogonal arrays for which 
the index is a power of s. He has used them in connection with confounded 
factorial designs. The concept of orthogonal arrays in its most general form is 
also due to Rao [5]. He discusses the use of these arrays together with some 
methods of constructing them and gives the following generalization of the ine- 
quality of Plackett and Burman. 

Txeorem. For an orthogonal array (As‘, k, s, t), £ 2 2, the number of constraints 
k satisfies the inequality 


(2.2) rAs —12 Ci(s— 1) +--- + Crs — 1)* ift = Qu, 


t 


As’ — 1 S Ci(s — 1) + +> + CR(s — 1)* + Co's — 1)" 
ft = 2u+ 1. 


(2.3) 


When ¢ = 2, this leads to Plackett and Burman’s inequality (2.1). When 
t = 3, we get 

Coro.iary. For an orthogonal array (As°, k, 8, 3) of strength 3, the number of 
constraints k satisfies the inequality 


(2.4) k< E i ‘| ~~. 
s-— 1 


Theorems 1A and 2A proved in Sections 3 and 4 give an alternative proof of 
the inequalities (2.1) and (2.4). Theorems 1B, 2B, 2C improve these inequalities 
except for certain special values of s. 

Sections 5, 6 and 7 are devoted to the investigation of methods for con- 
structing orthogonal arrays of strength 2. A difference theorem is proved, which 
when used in conjunction with Galois fields enables the construction of the ar- 
rays (18, 7, 3, 2) and (32, 9, 4, 2). The first of these has been constructed by 
Burman [1] by trial and error methods. It is shown that if p is prime and s = p’, 
\ = p", [u/v] = c, then we can construct an orthogonal array (As’, k, s, 2), where 
k = {r(s°*? — 1)/(s° — s")} +1. 

Theorems 5A and 5B of Section 8 establish a connection between orthog- 
onal arrays and the theory of confounding in symmetrical factorial designs 
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(based on the use of finite projective geometries) first developed by Bose and 
Kishen [6] and later amplified by Bose [7]. It is shown that the problem of con- 
structing the orthogonal array (s’, k, s, t), r 2 t, s = p” and the problem of ob- 
taining a symmetrical factorial design with s levels and k factors, in which the 
block size is s” and in which all ¢-factor and lower order interactions are left un- 
confounded, both depend on finding a set of k points in PG(r — 1, p”) not of 
which are conjoint. Such sets have been obtained by Bose in [7] and his results 
can be immediately translated into the language of orthogonal arrays. This has 
been done in Section 9. Theorems 5A and 5B were given by Bush in his unpub- 
lished thesis [8]. It has recently come to our ®otice that Rao [9] independently 
obtained a theorem equivalent to 5A, and derived from it the array (2", 2”~", 2, 
3) given by us in Section 9(a). The results in paragraphs (b) and (c) of Section 
9 are new. An improvement of the inequalities (2.2) and (2.3) has been given by 
Bush [8, 10] for the special case X = 1. 


3. Upper bound for the number of constraints for orthogonal arrays of strength 
2. Two columns of an erthogonal array are said to have 7 coincidences if there 
are exactly 7 rows in which the symbols appearing in the two columns have the 
same value (i.e., are the same elements of =). For example, the first column in 
the array (2.0) has 1 coincidence with each of the second and third columns, but 
has 3 coincidences with the fourth. 

For any orthogonal array (N, k, s, ¢) of index d let n; denote the number of 
columns (other than the first) which have 7 coincidences with the first column. 
Since the total number of columns is N = )s‘, 


Ik 
(3.0a) > n; = rs’ — 1. 


i= 


We shall show that 
k 


d ii — 1) «++ G—h + 1) = klk — 1) «++ RK —h + 1s — J), 
(3.0b) =0 


ioe 


IIA 


t 


The formula (3.0a) can be regarded as a degenerate case of (3.0b) for h = 0. 

Consider the subarray obtained by choosing any h rows of (N, k, s, t). The 
first column vector of this array appears in exactly \s"* — 1 other columns of 
this subarray. Since it is possible to choose the subarray in C}, different ways the 
total number of h X 1 vectors appearing in columns other than the first which 
are identical with the corresponding vector of the first column is (As"" — 1)C4. 
But any column which has 7-coincidences with the first contributes nothing or 
C;, to this number according as 7 < h ori 2 h. Hence 


k 
(3.0c) > n: Ci = Cis — 1), 
i=mO ° 


where C;, is to be interpreted as zero if i < h. This is equivalent to (3.0b). 
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Let us now confine our attention to orthogonal arrays of strength 2. Then 
(3.0a) and (3.0b) lead to 


‘ 
(3.1a) > n = rs" — 1, 


i=0 


k 
(3.1b) > in; = ks — 0), 
t=O 


k 


(3.1¢) > ii — 1)n; = kk — DO —- 1. 


Consider the function 
k 
f(z) = a Gi — 2) —1—2)n,, 
defined for integral values of x. Then 
(3.2) f(z) 20 
since n; = 0, and the factors (¢ — x) and (it — 1 — x) are both negative if 7 < 2, 


and both positive if 7 > x + 1. Also one factor is zero if i = x or z + 1. Now 


k k k 
fz) =DLiti- L)my — 2x Do ins + ee +) Dm; 


t=O 
whence from (3.1), we get 
f(x) = A{k(k — 1) — 2kas + x(x + 1)s*} — {k(k — 1) — 2ke + 2(x + 1)}. 
From (3.2) 


(3.3) kk — 1) — tke + ae +1) 


= k(k — 1) — 2kes + x(x + 1)s*° 
Setting 


a=k-—1-— ‘zs, 
we can after some reduction, write (3.3) in the form 
. ) 
(3.4) ar {1 + we 
where D can be expressed in two equivalent forms 
(3.5a) D = (s — 1)(k — a — 1) + aa + 1) 
(3.5b) = k(s — 1) — (a+ 1) (8 —1— a). 


We shall now prove Plackett and Burman’s inequality (2.1) for orthogonal 
arrays of strength 2, and then proceed to improve it if X — 1 is not divisible 
by s — 1. Let 


A-—1=a(s—1) +5, 0sb<s-1, a2 0. 





$12 R. C. BOSE AND K. A. BUSH 


Therefore 
i — 1 


(3.6) =A +rAt+at : 
s— | s— ] 


Suppose there exists an array with k = As + A+ a + 1. Then 
k-1=sA+a)+0+1. 


The integer x is at our disposal. Let us choose x = \ + a; thena = b + 1, 
so that 0 < a < s. 
From (3.5a) we have 


D = s(s — 1l)hA+ a) + a(a +1) > 0; 
so that 


a(s — a) 
— -—. Ps 0. 


Hence from (3.4) and (3.6) 


which is a contradiction. Hence, the value k = As + X + a + 1 is inadmissible, 
and so are all higher values. This proves the inequality of Plackett and Burman. 

THEOREM 1A. For any orthogonal array (\s*, k, s, 2) of strength 2, the number 
of constraints k satisfies the inequality 


1 — 1 
-< ad 
bs (I. 


Consider now the case when A — 1 is not divisible by s — 1, so that 0 < b 
s — 1. Let 


k=As+A+a-—N, b>ne= 
Therefore 
k-1=sA+a)+b—n. 

Choosing x as before, we now have 

0O<a=b-n<s—l. 
Therefore 

(a+1)\(s-—-1—a)>Q0. 

Hence from (3.4) and (3.5b) 


As — 1 _ , als — a) 
a=) °° G-5" 
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>—-n t+ 


_b 


(b — n)(s — b +n) 


s— 1 s— 1 


Therefore 
(3.7) (b — n)(b + 1 — n) — 8(b — 2n) > 0. 


Hence if n is any integer (b > n 2 O) for which the relation (3.7) is contra- 
dicted then the value k = \s + \ + a — n and all higher values are impossible. 
The first term in (3.7) is never negative, so that for n > 6/2, this relation will 
never he contradicted. Hence we may drop the restriction b > n. The quadratic 
equation obtained by replacing the inequality by equality in (3.7) has one posi- 
tive and one negative root, since the product of the roots is —b(s — 1 — b) and 
0 <b < s — 1. The positive root may be written as 


> ie PZ —, - 
(3.8) 9 = Vi + 40(s — 1 ae (2s — 2b — 1) 


- 


The largest value of n which contradicts (3.6) is [6]. Hence we may state the 
following theorem. 


THreoreM 1B. JfX — 1 = a(s — 1) + b,0 < b < 8s — 1, then for the orthogonal 
array (As*, k, s, 2) of strength 2, the number of constraints k satisfies the inequality 


(3.9) k< Pe o '] = i §, 


A- 1 
where @ is the positive number given by (3.8). 


4. Upper bound for the number of constraints for orthogonal arrays of strength 
3. Consider an array of strength 3, and let n; denote the number of columns 


(other than the first) which have 7 coincidences with the first column. From 
(3.0a) and (3.0b) 


k 
(4.0a) > n; = ds* — 1, 


i=) 


k 
(4.0b) 2 in; = k(s’ — 1), 
v= 


k 
(4.0c) d ii — 1)n; = k(k — 1) As — 1), 


i=0 
k 


(4.0d) > it — 1) — 2)n; = k(k — Ik — 2Q — DD). 


i=0 


If x is any positive integer then 


k 


(4.1) fiz) = Siti-1 


i=) 
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whence from (4.0) we get 

f(x) = k(k — 1)(k — 2) — 1) — 2zxk(k — 1)(As — 1) 

(4.2) 
+ kx(x + 1)(As? — 1) 2 O. 


Since k = 1, we have 


(4.3) .> (k — 1(k — 2) —22(k — 1) + a(x + 1) 


(k — 1)(k — 2) —2z(k — 1)s + x(x + 1)s?’ 


which is the same as (3.3) with k — 1 instead of k. Hence reasoning as before we 
can prove the following theorems. 


TueoreM 2A. For any orthogonal array (As’, k, s, 3) of strength 3, the-number 
of constraints k satisfies the inequality 


(4.4) k< [s a ‘| 4, 


s— 1 


TuHeoreM 2B. JfX — 1 = a(s — 1) + b,0 < b < 8s — 1, then for the orthogonal 
array (As°, k, 8, 3) of strength 3, the number of constraints k satisfies the inequality 


A- ] 


where @ is the positive number given by (3.8). 

Theorem 2A is the same as the Rao inequality (2.4) and Theorem 2B improves 
it for the case when A — 1 is not divisible by s — 1. 

We shall now show that when A — 1 is divisible by s — 1, we can still improve 
the inequality of Theorem 2A, except in certain special cases. In fact we can 
state the following theorem. 

TueoremM 2C. For any orthogonal array (As°, k, s, 3) of strength 3, if¥ — 1 = 
a(s — 1), and (s — 1)°(s — 2) is not divisible by as + 2 then the number of con- 
straints k satisfies the inequality 


(4.6) k< = me ‘] a 4, 
s=— 1 


Now [(As” — 1)/(s — 1)] = as’ + s + 1. If possible let k = as” + s + 1. Choose 
x = as. Then it is easy to verify from (4.2) that f(z) = 0. Hence 
k 


Zz iit — 1 — as)(i — 2 — as)n; = 0. 


i=m( 


2 
(4.5) k< x er ‘| — {6), 


Since n; = 0, it follows that n; must vanish for all values of i except i = 0, as + 
1, as + 2. From (4.0b) and (4.0c) we get 


(as + 1)Masg1 + (a8 + 2)Nas42 = k(as’ — as? + s* — 1), 


2 
ASNass1 + (as + 2)nary2 = ks(as’ — as +s — 1). 
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Solving we get 
NaH = k(s v= 1), 


ks(as’ — 2a8 +a +s — 1) 
as + 2 


Nas+2 


2 . 
ble — 1) — ole — Ie — 94 ae 


as + 2 


Since na.42 must be integral, we arrive at a contradiction if (s — 1)°(s — 2) 
is not divisible by as + 2. Hence in this case k < as’ + s. 

Consider the special case \ = s. Then a = 1. If (s — 1)*(s — 2)/(s + 2) is 
integral, then 36 must be divisible by s + 2. We can therefore state the following 
corollary to Theorem 2C. 

Coro.iary. For the orthogonal array (s*, k, s, 3) if 36 is not divisible by s + 2, 
then the number of constraints k cannot exceed s° + 8. 


5. The method of differences for constructing orthogonal arrays of strength 2. 
The method of differences has been elsewhere used [11] for constructing incom- 
plete block designs. Here we shall use it to construct orthogonal arrays 
of strength 2. 

Let \ = af. An orthogonal array (As’, k, s, 2) of strength 2 is said to be 6-re- 
solvable if it is the juxtaposition of g = as different arrays (8s, k, s, 1) of index 
8 and strength 1. A 1-resolvable array is said to be completely resolvable. 
For example, the array (18, 6, 3, 2), obtained from (2.0) by deleting the last 
row is completely resolvable. 

If \ = af and the orthogonal array (As’, k, s, 2) is 8-resolvable, then we can 
add at least one more row and get an orthogonal array of k + 1 constraints. In 
the new row we have to put the first element of = in the columns belonging to 
the first component array, the second element of = in the columns belonging to 
the next component and so on. As will be seen later under appropriate circum- 
stances, it may be possible to add more than one row without destroying the 
orthogonality of the array. 

Tueorem 3. Let M be a module (additive group) consisting of s elements, é , 


€1,°** , 1. Suppose it is possible to find a scheme of r rows, with elements be- 
longing to M 


Gy; Ayo *** Ain 

* Az Ae *** Ae 
(5.0) nike ar 
Ar, Apo *** Arn 


such that among the differences of the corresponding elements of any two rows, each 
element of M occurs exactly d times (n = Xs); then the method of constructing a com- 
pletely resolvable orthogonal array (As’, r, 8, 2) of strength 2 is as follows: Write 
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down the addition table of M. Then replace each element in the scheme by the row 
of the addition table corresponding to the element (using only the suffixes if the set 
>» is taken as 0, 1, --- , s — 1). This gives the completely resolvable array (As’, r 


, 
‘ 2 ‘ 
8, 2). A new row can be added to obtain an array (As, r + 1, 8, 2) of r + 1 con- 


straints. 

Before proceeding to a formal proof we shall illustrate the use of the theorem, 
by constructing the orthogonal array (18, 7, 3, 2). For M we take the Galois 
field GF (3), whose elements are residue classes (mod 3). Let e¢ = 0, e: = 1, 
é, = 2. The addition table of M is 


& & 


€o | 
ei €&; C2 


2 €2 @p ea 


It is not difficult to construct by trial a six rowed scheme 


&y Co Co & Co & 
0 €1 C2 € C2 


€o €1 Co Co C2 Cj 


C2 €2 Oo Gy C1 


€p €y C2 €1 Co C2 
&o C2 €y Oy C2 & 


where among the differences of the corresponding elements in any two rows 
each of the three elements eé , e; , €2 occurs twice. In order to convert the scheme 
(5.2) into the completely resolvable orthogonal array (18, 6, 3, 2), we replace 
each element of M by the suffixes in the corresponding row of the addition table 
(5.1). Thus 


@&— 0, 1, 2, 
é,— 1, 2, 0, 
€2 — 2, 0, Be 


We thus obtain the first six rows of the array (2.0) given in the Introduction. 
Finally to obtain the array (18, 7, 3, 2) we add a new row consisting of six 
zeros (occupying the columns of the first two groups) followed by six ones, fol- 
lowed by six twos. It should be noted that from Theorem 1B, 7 is the maximum 
possible number of constraints for an array of size 18 and strength 2, with 3 levels. 
We shall now proceed to » formal proof of Theorem 3. The s° 2 X 1 vectors 
whose components are ele «nts of M can be divided into s classes, each class 


cm belongs to the 
e; 


class corresponding to e, . Now in the addition table of M the difference of the 


corresponding to one element of M. If e; — e; = & then ( 
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corresponding elements of two different rows remains constant so that the vec- 
tors formed from the rows corresponding to e; and e; consist of all vectors of the 
class corresponding to e, . Since in our scheme among the differences of corre- 
sponding elements of any two rows, each element of M occurs just \ times, when 
our scheme is expanded and each element replaced by the corresponding row 
of the addition table, every vector will occur \ times. (Replacing the elements 
by the corresponding suffixes will change the set = from M to the set 0, 1, 2, 
-,8— 1.) 


6. Construction of a completely resolvable array (As’, \s, s, 2) of strength 2 
and \s constraints, when the index \ and the number of levels s are both powers 
of a prime p. Let \ = p", s = p’. Consider the Galois field GF(p"™). The ele- 
ments of the field can be expressed either as powers zx‘ of a primitive element 
x(i = 0,1, --- , p**” — 1) together with the element zero, or as polynomials of 
degree u + v — 1 with coefficient from GF(p), the field of residue classes (mod p). 
(For a brief exposition of these properties of Galois fields see [11] and [12].) To 
add two elements we use the polynomial form adding the coefficients (mod p), 
and to multiply we use the power form remembering the relation 


(6.0) a** = x, 
For example if p = 2, u = 1, v = 2, we consider the Galois field GF(2°), whose 
elements may be exhibited (using the minimum function o+2’+ 1) as 

0 

1 


x 


2 2 
z=f2, 


=2+1=2', 
=rt+r=2', 
=2+2+1=2 


We have ordered the elements of the field in what may be called the lexico- 
graphic order, that is, if a; = at” + a,x + ao then the integer 7 is expressed as 
20,4) in the scale of numeration with radix 2. The same is done for the general 
case GF (p***). If 


(6.2) a = Oye” + ees fae” + aye” + +++ + ae t+ a, 


then 7 = a,_; «++ 4d in the scale of numeration radix p, where n = u + v. 
Consider the sub-class M of the elements of GF (p“**) for which the coefficients 
of x’ and higher powers of x are zero, when the element is expressed in the poly- 
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nomial form. In our example the sub-class M consists of the elements ap, a , 
a , a3. In general M will consist of the first p’ elements of GF(p**’) when they 
are arranged in the lexicographic order. We now establish a correspondence 
between the elements of the field, and the elements of M in the following manner: 
The element a; of GF(p"*’) given by (6.2) corresponds to the element 


(6.3) aj; = Gye”) + +s + ae + a 


° y—1 ° 

of M, the coefficients of z”~ and lower powers of x for a; being the same as the 
coefficients of the corresponding powers of x in a; . It is clear that a; is uniquely 
determined by a, , and that 


j = i (mod p’), Osj<p’. 


Conversely to each a; of M there correspond p* elements of GF(p"""), since if 
a; is given by (6.3) then for a; the coefficients a,_,, --- , a, are arbitrary each 
taking p possible values. It should be noticed that M is a direct factor module 
in GF(p"™’) and that the correspondence used by us is a projection. 

In the example under consideration the correspondence between the elements 
of GF(2’) and M is given by 


a4, A — aw, 
a, a> a, 


(6.4) 


a, a2 —> Oe, 
a7, a3 —> a3. 


If we write down the rows of the multiplication table of GF(p**") and then 
replace each element by the corresponding element in M, we get a p“* rowed 
scheme which can be shown to satisfy the conditions of Theorem 3. If we take 
the difference of the corresponding elements in any two rows of the multiplica- 
tion table, then every element of GF(p"*’) occurs exactly once. Also if the ele- 
ments a;, a; of the field correspond to the elements of a;, a; of M, then the 
element a; — a; of the field corresponds to the element a; — a; of M. This 
shows that in the scheme we have obtained each element of M occurs exactly 
\ = p” times, among the differences of the corresponding elements of any two 
rows. It follows from Theorem 3, that if each element of the scheme is now 
replaced by the corresponding row of the addition table of M (retaining only 
the suffixes) we get the completely resolvable array (As’, As, s, 2) where X = p”, 
8 =p’. 

For example when p = 2, u = 1, v = 2, we have to write down the rows of 
the multiplication table of GF(2°). This can be done by using the identifications 
given in (6.1), remembering that 2° = 2. We thus get 
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Ag Ap Ap Ap Ao Ap Ao Ay 
Ay Gy Ae Az Ag Ah Ap Az 
Ay Ag A, Ag Ay Az Ay Ag 
ae Gq Az Ag Ay Ay Ap Ay ay 
(6.5) 
Ay Aq Ay Ay A7 Az Ap Ae 
Qo & A7 Ae Az Ag Ay QA) 
Ay Ag Ay Az Ae Ay Ag As 
Ay Az Ag Ay Ap Ay Asp Ae, 


Using the correspondence (6.4) the difference scheme is given by 


Qo Aq Aq Ay A Ay Ay A 
Oy Ay Ag Az Ao Ay Ay Ay 
Qo Ag Ay Ag Ay Az Ay A 
(6.6) Qp As A. Ay Ay Az Az A 
Qn Ay Ay Ay Ay Az Az Ag 
Qo Ay Az Az Az Ag Ay ay 
Qo Ae Ay Az A, Ay Az Ay 
Qo Oz Az Ay Ay Ay A A. 


To obtain the completely resolvable array (32, 8, 4, 2) we replace the a’s in 
(6.6) by the suffixes in the addition table of M. These replacements are 


a —- 0, A 2; 3, 
a -> 1, 0, 3, 2, 
a — 2, 3, 0, 1, 


az; — 3, 2, 1, 0. 


(6.7) 


Finally the orthogonal array (32, 9, 4, 2) can be obtained by adding a final 
row consisting successively of 8 zeros, 8 ones, 8 twos and 8 threes. The com- 
pleted array is 


01230123012301230123012301230123 
01231032230132100123103223013210 
01232301012323011032321010323210 
01233210230110321032230132100123 
(6.9) 01230123103210323210321023012301 
01231032321023013210230101231032 
01232301103232102301012332101032 
01233210321001232301103210322301 
000000001111111122222222333333 8 3. 


It should be noted that 9 is the maximum possible number of constraints for 
an orthogonal array of size 32 and strength 2 with 4 levels (cf. Theorem 1B). 


7. Adjunction of new rows to the completely resolvable array (2s’, 2s, s, 2), 
where 2 = p", s = p’ and is a prime. As alreddy explained we can add at 
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least one more row to the array without destroying its orthogonality giving \s + 
1 constraints in all. Let 


u=ortd, c2 0, Osd<v. 


If c = 0, we stop after one row has been added. But if c > 0 we shall show that 
we can do better. Now wu 2 v. Let us denote by (Ao) the original completely 
resolvable array (As’, As, s, 2). Using the same construction as for (Ao), we can 
obtain another completely resolvable array (A,s?, \1s, s, 2) where Ay = p” ”. Let 
us call this array (A,). It should be noticed that the number of columns in (A;) 
is equal to the number of arrays of strength unity composing (Ao) since \\s° = 
As = p’”*. We now inflate (A;) by repeating each column s times, thus arriving 
at the array (A!), which has the same number of columns as (Ay). We now ad- 
join (A}) to (Ao) placing the former just below (Ao). The result is that below any 
component of (Ay) we get the same column of (A;) repeated s times. In view of 
the resolvability property of (Ao) it is clear that if we choose a particular row 
of (Ap) and a particular row of (A;) then every ordered pair occurs \ times. 
Hence the whole array (4°) is of strength 2 and has As + Axs constraints. 
Ay 


Since A, is completely resolvable, (4!) is s-resolvable. If c = 1, then A; < s, 
411 


we stop after adjoining a final row to ( 1! consisting of As zeros followed by 
441 


As ones and so on, getting As + Ays + 1 constraints in all. 

On the other hand if c > 1, we do not adjoin the final row as yet but construct 
a completely resolvable array (A2s", Aes, 8, 2) where \. = p” *’. Denote this array 
by (A2). We next inflate (A,) to (A) by repeating each column s’ times and ad- 


Ao 
pt tiel, A, noe ’ , 
join it to( i‘) arriving at the array| A, | of strength 2 with As + Ays + Aes con- 
“Ay ” 
A: 


straints. If c = 2 we finish the process by adding the final row but if c > 2 we 


continue on as before. 
The whole process therefore leads to an orthogonal array of strength 
which the number of constraints is given by 


bo 


in 


(7.0) As +s +--+ +A84 1, hi = V/s'. 


We can therefore state the following theorem. 


THeoreM 4. Given s = p', \ = p” (where p is a prime) then we can construct 


an orthogonal array (As*, k, s, 2) of strength 2, in which the number of constraints k 
is given by 


rl ms 
(7.1) k = XG Dy 1, 


st ens gsc! 


where c = [u/v]. 
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8. The use of finite projective geometries in the construction of orthogonal 
arrays. 
THEeoREM 5A. If we can find a matrix C of k rows and r columns 


he ee eu 


(8.0) Ca [™ & "Owl 


; cA 
len oe: * >=: - ae 


whose elements c;; belong to the Galois field GF (p"), and for which every partial ma- 
trix obtained by taking t rows is of rank t, then we can construct an orthogonal array 
(s’', k, 8, t), where s = p”. 

Proor. Consider r X 1 column vectors whose coordinates belong to GF(p”"). 
Then there are s’ different ~. Form the matrix A whose s’ columns are the k X 1 
vectors Ct. Then A is the required orthogonal array. 

If A’ isat X s’ submatrix of A, and C’ is the corresponding t X r submatrix 
of C, the columns a of A’ are C’é, and since C’ is of rank ¢, each a is obtained 
from s” ‘ different £. Hence in A’ each possible ¢ X 1 column vector occurs with 
the frequency \ = s"‘, which shows that A is an orthogonal array of strength 
t and index X. 

The rows of the matrix C may be interpreted as the coordinates of a point 
in a finite projective space PG(r — 1, p") such that no ¢ of the points are con- 
joint. We thus get the following theorem: 

TueoreM 5B. If we can find k points in PG(r — 1, p”) 8o that no t are conjoint, 
then we can construct an orthogonal array (s", k, s, t) for which} = 8", 8 = p”. 

It has been shown by Bose [7] that the maximum number of factors that it 
is possible to accommodate in a symmetrical factorial experiment in which each 
factor is at s = p” levels, and each block is of size s’, without confounding any 
t-factor or lower order interaction, is given by the maximum number of points 
that it is possible to choose in the finite projective space PG(r — 1, p”) so that 
no t of the chosen points are conjoint (a set of ¢ points are said to be conjoint 
if they lie on a flat space of dimensions not greater than ¢ — 2). This number is 
denoted by m,(r, s). It is clear from Theorem 5B that we can always construct 
an orthogonal array (s’, k, s, t), for which the number of constraints k S m; (r, s), 
if s = p” where p is a prime. The value of m,(r, s) has been determined by Bose 
in a number of important cases, and the corresponding set of points in which 
not are conjoint has been obtained. These results are used in the next section 
to construct some orthogonal arrays of strength 3. 


9. Construction of some orthogonal arrays of strength 3. 
(a) Consider the special case s = 2. In PG(r — 1, 2) consider the set of all 
points, which do not lie on the (r — 2)-flat 


(9.0) Mttwet+::+ +2z,=0. 


a OO 
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There are exactly 2”* such points, namely the points in whose coordinates there 
are an odd number of unities, and the rest zero. No three of these points are col- 
linear since in PG(r — 1, 2) each line passes through exactly three points, and 
one of these lying in the plane (9.0) is excluded from our set. Taking the coordi- 
nates of these points for the rows of the matrix C of Theorem 5A, we can construct 
the orthogonal array (2’, 2", 2, 3) of strength 3 and 2”~' constraints. Theorem 
2A shows that this is the maximum possible number of constraints. 
As an illustration consider the case r = 3. The four points of PG(2, 2) not 
lying on the line x; + 22 + x3 = Oare (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 1). Hence 
the corresponding matrix C is 


1 0 0 
oe ae 
( = 
(9.1) C 001; 
see 
The eight possible column vectors & are 
01000111 
00101011 
00011101 
The columns of the required array (8, 4, 2, 3) are obtained by forming Cé given 
below. 
01000111 
00101011 
9 
(9.2) CO0GéliitTe} 
STizrec?d t. 


Similarly the array (16, 8, 2, 3) is given by (9.3) 


0100011100001111 
O0TOOTEOVULIOLT@OILII 
GOOe Teerteieoirieii 
OG0GCCLTOGLrOI1IL12101 

(9.3) O01L1LILL1100010001 
ORGOLIIVOCIIEO@iIOO! 
0110101010100101 
0111000101100011 

(b) Let s = 2”. In the finite projective plane PG(2, 2”) take the non-degen- 
erate conic 
(9.4) ax + bri + cx} + faoxs + gxst; + hax, = 0, 


where 


A = af’ + bg’ + ch’ + fgh ¥ 0. 








a i RN. i illic 
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Of course no three of the points P;, P:, --- , P.4, on the conic (9.4) are col- 
linear since no line can cut it in more than two points. Through any 
point (2; , 72, x3) on the conic there pass s + 1 lines, of which s join it to the re- 
maining points of the conic, while there is one line which does not meet the 
conic in any point other than (2; , 22, 23). This may be called the tangent at 
(x; , £2, 23). Its equation is 


(9.5) (wire + 2x) + g(xivs + cami) + A(z + zzz) = 0. 


It is a peculiar feature of the finite projective geometry, based on a field of 
characteristic 2, that every tangent to a given non-degenerate conic passes 
through the same point. For example in the present case the arbitrary tangent 
(9.5), passes through the point Py with coordinates (f, g, h). The s + 1 tangents 
to (9.4) account for all the lines which pass through Py . Hence no line through 
P, can meet the conic in more than one point. Thus Py , Pi, P2,--: , Pesyisa 
set of s + 2 points, such that no three are collinear. Hence from Theorem 5B 
we can use the coordinates of these points to construct an orthogonal array 
(s’, s + 2, s, 3) where s = 2”. 

Similarly when s = p” where p: is an odd prime we could construct the array 
(s*, s + 1, s, 3) by using the coordinates of the s + 1 points on a non-degenerate 
conic of PG(2, p”*). 

One of the authors, Bush [10], has shown that for an orthogonal array 
(s‘, k, s, 0) of index unity and strength ¢, the number of constraints k satisfies 
the inequality 


(9.6a) kss+i-l1 when s is even, 
(9.6b) kss+t-2 when s is odd. 


Using this result for ¢ = 3, we find that the number of constraints obtained 
by us for arrays of size s*, s levels and strength 3, cannot be improved. 

(c) Let o(2, y) = axi + 2hxx. + br} be a homogeneous quadratic with co- 
efficients belonging to GF(p") and irreducible in it. If s = p”, it can be shown 
[7] that the quadratic surface 


axi + Qhayr, + bri = ay% 


contains exactly s* + 1 points no three of which are collinear. We therefore get 
a method of constructing an orthogonal array (s*, k, s, 3) with k = s’ + 1 con- 
straints, when s is a prime or a prime power. On the other hand, Theorem 2C 
gives an upper bound s° + s for k when s ¥ 2, 4, 7 or 16 and an upper bound 
for k for these exceptional values of s is given as s° + s + 2 by Theorem 2A. 
Thus there remains a gap between the number of constraints which might be 
attainable, and the number of constraiats actually attained except for the case 
s = 2, for which we have already obtained an array (s*, 8, 2, 3) by the method 
(a). It is not known whether this gap can be bridged. It has been shown [7] that 
when p is odd we cannot get more than s* + 1 points in PG(3, p") no three of 
which are collinear. The same has been proved by Seiden [13] for the case s = 2°. 


OO 
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Hence for these cases the geometrical method cannot lead to more than s* + 1 
constraints, but there remains the possibility that some other combinatorial 
procedure may lead to a larger number of constraints. 

As example consider the case s = 3. The coordinates of the 10 points lying 
on the quadric x} + 23 = 2x32, of PG(3, 3) are (0, 0, 1, 0), (0, 0, 0, 1), (0, 1, 1, 1), 
@, I, 2,2, 0,9, 1, 243,32, ah 6, & 4,2 G18, 3 ae 8 2 4.38. 0. 
Using these as the rows of the matrix C, we get the orthogonal array (81, 10, 
3, 3), and 10 is the maximum number of constraints obtainable by the geometri- 
cal methods. Theorem 2C gives k < 12. We do not know whether we can get 11 
or 12 constraints in any other way. 
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A NONPARAMETRIC TEST FOR THE SEVERAL SAMPLE PROBLEM! 


By Wiiuiam H. Kruskau 
University of Chicago 


1. Summary. Suppose that C independent random samples of sizes m ,--- , Nc 
are to be drawn from C univariate populations with unknown cumulative dis- 
tribution functions F; , ---, F¢ . This paper discusses a test of the null hypothesis 
F, = F, = --- = F¢ against alternatives of the form 


F(x) = F(x—8;) (all z,¢ = 1, ---, C) 


with the 6;,’s not all equal, or against alternatives of a much more general sort 
to be specified in Section 5. The test to be discussed has as its critical region large 
values of the ordinary F-ratio for one-way analysis of variance, computed after 
the observations have been replaced by their ranks in the >-n,-fold over-all 
sample. This use of ranks simplifies the distribution theory, and permits appli- 
cation of the test to cases where the ranks are available but the numerical values 
of the observations are difficult to obtain. Briefly, then, we shall consider a non- 
parametric analogue, based on ranks, of one-way analysis of variance. 

It is shown in Section 4 that, under quite general conditions, the proposed 
test statistic, H, is asymptotically chi-square with C — 1 degrees of freedom when 
the null hypothesis holds. Section 5 derives a necessary and sufficient condition 
that the natural family of sequences of tests based on large values of H all be 
consistent against a given alternative. Section 6 derives the variance of H under 
the null hypothesis, Section 7 derives the maximum value of H, and Section 8 
gives a difference equation which may be used to obtain exact small-sample 
distributions under the null hypothesis. These derivations are made on the as- 
sumption of continuity for the cumulative distribution functions; Section 9 con- 
siders extensions to the possibly discontinuous case. 


2. Introduction. Until Section 9 all cumulative distribution functions will be 
supposed continuous. The over-all sample consists of the don: = N (say) in- 
dependent random variables g° i = 1, ---, C;j = 1, ---, n:), where the super- 
script refers to the (sub)sample and the subscript indexes observations within 
a (sub)sample. Under the null hypothesis all the ¢’s have the same continuous 
but unknown cdf (cumulative distribution function): F(z). Each ¢$° is immedi- 
ately replaced by X$°, its rank in the over-all sample. Then, under the null hy- 
pothesis, the N-tuple (X{”, ---, X02, X]?, ---, X@, ‘ak gs +a 
takes as values with equal probability the N! permutations of (1, 2, ---, N). 

Next let R; = >>‘, X5° be the sum of ranks of sample from the ith popula- 
tion and let R, = R,/n;. Of course )>R,; = 4N(N+1). The standard one-way 


1 Work done under the sponsorship of the Office of Naval Research. 
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analysis of variance test based on the X’s has for its critical region large values 
of 


dn (R- SEN /[ oe Sexe - ay]. 


t=1 t=1 j=l 





But this is a monotone increasing function of 


< N+1Y Sef N+1¥ 
ay Ea(e- TE /[PE(xe-2H4)] 


and because of the use of ranks the denominator of the above expression is a 
constant. Hence a critical region consisting of large values of the numerator of 
(2.1) is suggested. The corresponding test is the one to be discussed in this paper. 
Actually this test will be discussed in terms of the random variable 


12 - N+1Y 12 S Ri ' 
H= yr) 5 om (R -——— ) “WED 25, ~ 34 + 1). 

Since the variance of the uniform distribution on the integers 1, 2, ---, N is 
(N’—1)/12, it is natural to expect that the numerator of the F-ratio in terms of 
ranks divided by this variance is asymptotically chi-square with C—1 degrees of 
freedom. But this normalized numerator is just H N/(N—1). The minor ad- 
vantage of H over this and other asymptotically equivalent random variables 
upon which the test might be based is that under the null hypothesis EH = C — 1 
= E(chi-square with C — 1 degrees of freedom). 








3. Relationship to other tests. When C = 2 the test discussed in this paper is 
the same as the symmetrical two-tail version of a test considered by Wilcoxon 
[11] and by Mann and Whitney [2]; for when C = 2 


12 N+1/ 
B= a (e — * FE). 

For C = 2, a test against any alternative (subject to existence of and weak con- 
ditions on a density function) is provided by the work of Wald and Wolfowitz 
[9] who propose and discuss a test based on runs in the over-all sample. A gen- 
eralization of the Wald-Wolfowitz test for any C is available (e.g., Wallis [12!). 
For any C a test based on the median of the over-all sample and reducing to a 
conventional chi-square test has been proposed by Brown and Mood in Chapter 
16 of [4]. A generalization of this test using several previously determined order 
statistics of the over-all sample is described by Massey [3]. Other tests are dis- 
cussed in [1] and [13]. For C = 2 a recent addition to the list of tests has been 
made by Marshall [14]. 

Whitney [10] in the case of C = 3 considers two tests designed to have par- 
ticular power against the following two types of alternatives, respectively: 


(1) Fi(x) > F2(x) and F(x) > F3(zx) (all x), 
(2) Fi(x) > F2(x) > F3(x) (all x). 





His tests appear to be generalizable for any C. 
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The test discussed in this paper, that is, that based on large values of H, is 
closely akin to tests considered by Pitman, Friedman and others for two-way 
analysis of variance problems (see the expository paper by Scheffé [7] for a gen- 
eral discussion and references). However, this particular application of the ran- 
domization method has not to my knowledge been discussed in the statistical 
literature. Further discussion of related tests will be found in [15]. 


4. Asymptotic distributions. The term “‘asymptotic distribution” will be taken 
in the sense of convergence in distribution as N —«. We shall assume at present 
that, for all 7, limy..n;/N = y; exists and is positive. We proceed to show that 
under the null hypothesis the R;’s, properly normed, have asymptotically a singu- 
lar multivariate normal distribution, and that from this the asymptotic chi- 
square distribution of H readily follows. The proof will be a direct application of 
a powerful general theorem of Wald and Wolfowitz [8], and I shall suppose that 
the reader is familiar with this reference. A consequence of the Wald-Wolfowitz 
theorem, in a form appropriate for our purpose, may be stated as follows: 

THEOREM 4W (Wald-Wolfowitz). Let {Aw} be a sequence of ordered N-tuples, 
Ay = (am, Gyo, °**, Qvw), (N = 1, 2, 3, ---) satisfying condition W of {8}. Let 
(Zx1, +++, Zwx) for each N be a random ordered N-tuple taking as values with equal 
probability the permutations of Ay . Let {n§} (¢ = 1, ---,C;N = 1, 2,3, ---) 
be C sequences of non-negutive integers such that 


c 
N rT . N 
> ns? = N, lim nf? /N = »; 
N00 


t=] 


exists. Let LX = >> Zya fori = 1, «++, C where the summation is from a = 
Domi nf? + 1ltoa = Doin, nS. Let vy be the common variance of any Zwa. 
Then, asymptotically, the random variables [L\” — EL” |/+/Now have the singular 
C-variate normal distribution with mean zero and covariance matrix whose i, 1’ 
term is 


(4.1) bie Vim Vy Vy. 


The proof of Theorem 4W follows from Corollary 1 of [8] via the technique used 
in Section 7 of [8], that is, via the consideration of arbitrary linear combinations 
of the random variables [L{” — ELS}]/+/Nox . In order to save space the 
details are omitted. Note that for Theorem 4W itself it is not necessary to as- 
sume that the »,’s are positive. 

To apply Theorem 4W to our case, set dv2 = a and observe that the resulting 
{Ay} satisfies condition W of [8] (see, e.g., Section 3 of [8]). LX” is called R; , and 
it may be readily computed that ER; = 3 n{*’ (N + 1) and vy = (N* — 1)/12. 
Hence the variables 


De sai ey Ses 


fm 2 


(dropping the superscript “‘N’”’ for convenience) have asymptotically the singu- 
lar multivariate normal distribution with zero mean and covariance matrix 
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given by (4.1). We next use the assumption that all the »,’s are positive, from 
which it follows immediately that the variables 
R; — ER; 
T; = vV 12 a2 
N *8/ n/N 
have a joint asymptotic normal distribution with zero mean vector and with the 


covariance matrix whose 7, 7’ term is 6;;, —~/»,;v;. We now make the standard 
analysis of variance transformation 


c c 
S= Vel, SD eels G@ = 1,2,---,€-1) 


i’=1 


with the e’s chosen to make the transformation orthogonal. It follows that 
> £1 Ti is asymptotically chi-square with C—1 degrees of freedom. But 


2 WS l N+1V¥_N+1 
Hence H is asymptotically chi-square with C—1 degrees of freedom. 

It seems desirable to make a few comments regarding possible weakening of 
the conditions for an asymptotic chi-square distribution. In the first place, no 
great difficulty arises if some »v;’s should be zero—for example, suppose that 
v; = 0 and the other »,’s are positive. Then (R, — ER,)/N*” approaches zero 
stochastically and [V/(N+1)] > im T; , that is, H computed from the sample 
without including R, , is asymptotically chi-square with C—2 degrees of freedom. 
It is not however true in general that >> $f, 7; is asymptotically chi-square; for 
example, consider the case of n, = 1 for all N. Analogous remarks apply if more 
than one »; = 0. 

If we use chi-square with C—1 degrees of freedom to approximate the critical 
region, then it may be wise to drop from the total sample any (sub)samples with 
n;’s very small. We would do this in order to obtain a better approximation to 
the critical region, at the expense of losing power against certain alternatives 
involving the populations from which the omitted observations arose. On the other 
hand, because of the smallness of the n;’s in question, even the exact critical re- 
gion would probably have had little power against these alternatives. 

Whitney in [10] uses a kind of limit requirement which might be thought ap- 
plicable here and weaker than the existence of the »v;’s; that is to suppose the 
existence, for 7 # j, of 


a Ni Nj 
“in po — a — 
(Assume all n; < N, so that the 7’s are defined.) That this requirement is little 
weaker than ours except effectively for the case C = 2 is shown by the follow- 
ing lemma. 
Lema 4.1. If the v,’s exist and no v; = 1, then the 7;;’8 exist; and if v; = 0, 
then every 71; = 0 (j = 1, 2, ---, C,j ¥ 7). When C 2 3 if the 7;;’s exist and at 
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least two different r;;'s are # 0, then the v;’s exist; and if r;; = 0, then either v; or 
vj = 0. 

Proor. The first part is obvious. To prove the second choose an i, say i = 1 
for convenience. Suppose that we can then find a j and k (j # k;j, k # 1) such 
that ry. + 0. Then 


71; Tik/Tik = im ( = 

ee N-2 N = We ; 
Hence » exists, and if 7; = 0 then », = 0. Next suppose that no such rz exists, 
that is, that the only possible nonzero r’s are tz, 713, -**, Tic. By hypothesis 
at least two of these must be nonzero, say 712. and 73. Since 72; = 0, it follows 
as above that » and »; exist and are zero. The same comment holds for any other 
v; for which 7;; ¥ 0. Finally suppose a 7; = 0 (j ¥ 2, 3). Since r1. * 0 we have 

n;/(N “= n;) oe 


n= 12/(N — ™%) 
But the denominator here approaches zero itself. Hence the limit of n;/(N —n,) 
is zero, v; exists, and it is zero. Of course »; = 1. 

If only one 7, say 72 , is ¥ 0, then v3 , 4, ---, vc must all be 0. It can be shown 
that 71: = 1, as well, so that we are effectively in the C = 2 case. It is impossi- 
ble for all the 7’s to be zero. 

The material of this section is summarized in the following theorem: 

TueoreM 4. Jf for all i, lim n;/N = v; exists and is positive, then under the null 
hypothesis H is asymptotically distributed as chi-square with C — 1 degrees of freedom. 
If p v.’s are zero (p = 1, 2, ---,C — 1), then H computed with only the R,’s corre- 
sponding to nonzero v,’s is asymptotically distributed under the null hypothesis as 
chi-square with C — p—1 degrees of freedom. 


5. Consistency of the test based on large values of H. Suppose that the 
n,’s are functions of N, and consider the family of sequences of critical regions 
H 2 t,(N), where the level of significance a ¢ (0, 1) indexes the sequences of the 
family, N indexes the members of a sequence, and t,(.V) is the least number with 
the property Pr{H = t.(N)} S a under the null hypothesis. Let us say that 
this family of sequences is consistent against a given alternative if every mem- 
ber sequence is consistent in the usual sense against the given alternative, that is, 
if for all a € (0, 1) 


lim Pr{H = ta(N)} = 1, 
N20 


where the probabilities are taken under the alternative. For brevity we may 
simply say that the test based on large values of H is consistent. (Note that 
failure of consistency for the family of sequences against an alternative implies 
only that there is some a» such that for all a S ap the sequence of tests H 2 t.(N) 
fails to be consistent in the usual sense.) This use of the word ‘‘consistent’”’ will 
permit more compact statements and will not, I think, cause any confusion. 





; 
; 


tht ARS: 
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Under what circumstances is the test based on large values of H consistent? We 
consider alternatives of the following form: all the é’s are independent, end 35° 
has a continuous edf F;. Assume that as No, n;/N = »; + o(N*) with 
v; > 0; and note that this assumption subsumes the most natural case: n; = 
[v:N} or [v:NV] + 1. Since t.(N) for given a has as its limit the upper 100a-per- 
cent point of the chi-square distribution with C-1 degrees of freedom, it is equiva- 
lent to ask under what circumstances limy.. Pr{H = t} = 1 for all positive t. 
We may also replace H by >, T% since the two differ by a factor of N/(N+1). 

First, we ask under what circumstances, for all positive t, limy... Pr{| 7; | = 

= 1. Following the useful procedure of Mann and Whitney, set V = the num- 


ber of couples (X$”, XS!) where i = 2 or 3 or --- C and for which XS” x}. 
Then 


t} 


Ry = $ n(n, + 1) + Vz. 


This relationship holds for the special case XS” < n,,j = 1, 2, ---, m1; for then 
V = 0. It holds in general, since an interchange of the superscripts of two adja- 
cent X’s, one from sample 1 and the other from sample z + 1, increases or de- 
creases R, and V together by unity. Then 


/ 2 Day, E2 a 
n, N? 2 ; 2 


ak abe : 
200 — 4 (N —)}. 


Next define the following ni(N—m) counter-variables 


ry‘? Mw) > (i) 
- {*\ when X; 13} x 


Ti 





so that 


nmi 


a x fl 
es y‘ 
” £285 fe, 


From now on we deal with a specific alternative F, which will be described in 
slightly different terms further on. 


Lemma 5.1. The set of values of Var T,, as N runs through the positive integers, is 
bounded. 


Proor. Set Var Y$’), = v; and 


civ for j, = js, and either i ¥ 7’ or je ¥ ju, 
Cov (Y 7(i) ys >) ‘nit 


7132) V isis 


d; fori = @’, jo = ja, and j; ¥ js, 


0 otherwise (since ~’s independent). 
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Clearly the v’s, c’s, and d’s are all less than 1 absolutely. So 


Var Ti wins Var V 


<= {number of v; terms + number of c;;. terms + number of d; terms} 


SAP tm(N — mi) + m(N — m)* + af — nj = BA = WW + D 

which is finite and has the limit 12(1—»,) << ©. We next introduce the numbers 
gi = Pr{X?? > x$”} 

(under the alternative) for i ¥ 7’, and g,,x = 3. Hence fori > 1g; = EY‘ 


32° 
and 
12 : ; 
ET, = —j73 | Nis — ¥m(N — m) |. 
nN? im? 


Hence the limit of ET,/+/n, is 


Cc 


via| & vigns — ¥(1 — n) | = V2 bP Vigna — 1], 


1=2 


whence the limit of ET; is 


ba Sa hs 
mw 


and we have 

Lemna 5.2. If > ia vi fi; ¥ 4, then limy.. Pr{| 7, | 2 ¢} = 1. This follows 
immediately from Tchebycheff’s inequality. Consequently we may state 

Lemma 5.3. If for some i, >-{.01 vir giv: ¥ 4, then the test based on large values 
of H is consistent. We now turn to implications in the other direction. 

Lemna 5.4. If > vi 91,i = 4, then there exists a ty , a function N,(t), and a 
decreasing function G(t) such that 

(1) lim... G(t) = 0, 

(2) Fort > hand N > N,(t), Pr{| 7, | 2 t} S Git) < 1. 

Proor. Let K be an upper bound for Var 7,. Then by Tchebycheff’s inequal- 
ity, fort > O 
Pri iT. 128 = Pre iT. 28 + Pith o— 8 Sm + 

ee ae cr = ~ [t— ET,? ~ [t + ET,/ 

which has the limit 2K/t’, since ET, > 0. Putting ¢(t) = [max (1, é)]"'/4 and 
ty = 2V/ K, it follows that for any t > &, there is an No(t) such that for N > 
No(é) 


IT, 28 s+) <343<1. 








pn ntllipaliae Deiat tit 
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G(t) is the function in the middle of the above double inequality. Next this is 
generalized as follows: 

Lemma 5.5. If for all i, >>f.01 vv gi,» = 4, then the test based on large values of 
H is not consistent. 

Proor. By the previous lemma there are (for each i) numbers, é5”, and func- 
tions Nj" (t) and G‘”(t) such that G“” (t) + 0 from above monotonically as t +=, 
and such that for ¢ > t5" and N > No” (t), Pr{|T; | = t} = G°() < 1. Let 
t} = max; tj”, No(t) = max; Nj" (t), and G*(t) = max; G‘(t). Then G*(é) is a 
monotone decreasing function with limit 0, and for all 7,4 > t3, and N > No(t), 
Pr{| 7;| = t} S$ G*(t) < 1. Fort > t} and N > No(t), Pr{some| 7; | = t} S 
C-G*(t). But >> T? = s > O implies that some | 7; | = V/3s/C. Hence for s > 
€-t3?, and N > No (+/s/C) we have 


Pr{d. Ti = 8} S C-G* (v/s/C) 


To complete the proof take s large enough so that C-G* (1/s/C) < 1. 

It is natural to ask for a simple probabilistic interpretation of the necessary 
and sufficient condition for consistency which has been proven. This may be 
done as follows. Recall that we are still discussing a fixed alternative {F;} and 
that all probabilities are taken with respect to this alternative. Now let "”, ---, 


n° be C independent random variables independent of all the £’s and with edf’s 
F,. Then 


giie = Pr{n > e5°}. 


Next choose a ¢}” at random from among the N possibilities (i.e., take an ob- 
servation in the space of N ordered couples (I, J) where each has the same prob- 
ability 1/N.) Then 


Cc 


Cc ni’ 
Prin? > 8} = Lege = Do. 
N iv=l j=1 iv=1 J 

so that the test based on large values of H is ‘nconsistent if and only if for all 7, 
limye Pr{n‘ > &”} = 4. Roughly speaking this means that the test is con- 
sistent if and only if the variables from at least one population tend in the limit 
to be either larger or smaller than the other variables. 

In particular we have consistency under the following circumstances which 
generalize to the C-population case the sufficient conditions for C = 2 given in 
[2] by Mann and Whitney’ 


Fi(x) < F2{x), Fix) & Fi(x) (¢ = 3, 4, --- 


for all x. (Of course the choice of subscripts 1 and 2 here is just for convenience.) 
To show that the consistency condition is satisfied, note that for 7 = 3,4, ---, C 


2 The unnecessary specialization of the Mann and Whitney consistency condition when 
C = 2 was noted (separately) by Lehmann and van Dantzig; see p. 166 of [1] and [16]. In 
the latter both sufficiency and necessity are considered by a method similar to that of this 
paper, and further results are obtained. In 1948 E. J. G. Pitman gave the same necessary 
and sufficient condition for C = 2 during lectures at Columbia University. 
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= [ Fara) = [ F@) ar) = 3 
because of symmetry. However g:.2 > 4; for let m run through the positive in- 
tegers and set B,, equal to the set of all x satisfying Fi(x) + 1/m 2 F2(x) > 
F\(x) + 1/(m+1). Then 


9,2 [ F.(z) dF,(x) = > ni F(x) dF (zx) 


Eine ae +a if arve} 


-F+>— 5 f ane, 
and clearly at least one B,, must ik ei measure with respect to F; . Hence 
> S1 vgi.; > 4, and we have consistency. The circumstances just discussed in- 
clude the translational sort of alternative described in the introductory para- 
graph of this paper. 

A simple class of cases for which consistency fails and yet the null hypothesis 
need not hold is given by the following characteristic: that all the C distributions 
be symmetrical about the same point f in the following sense: 

F,(f-—z) =l1-—- Fi(f+z) 
for all i and z. For, setting f = 0 without loss of generality, this means that the 
distribution of every & is the same as that of its negative. Hence for all 7, 7’, 
Ji. = gv; = 4 and consistency fails. 

The material of this section may be summarized as follows. 

TueoreM 5. Suppose that the n,’s are functions of N and that for alli, n,/N = 
vi + o(N*) and v; > 0. For each level of significance a(0 < a < 1) consider the 
sequence of tests: reject the null hypothesis if H = t.(N) where t.(N) is the least num- 
ber giving rise to level of significance a at the Nth step. Then these sequences of tests 
are all consistent against a given continuous alternative {F;\ if and only if for some 


i, with probabilities taken under the alternative 
Cc 


Dv lPrin® > 9} + 4 Prin? = 0%}] # 4, 
where the n‘°’s are C independent random variables having respectively the cdf’s F ; . 
The sufficiency of the above condition holds regardless of the order of (n;/N) — 
v;. When C = 2 the denial of the above condition implies gx = gn = 4. 


6. The variance of H under the null hypothesis. As an aid in approximating 
the distribution of H when the null hypothesis is true, we seek the variance of 
H under the null hypothesis. This seems to be a tedious computation by any 
method; we shall outline a direct method, omitting most of the routine algebra. 
Directly from the definition of H we have 


, 5s 144 f . oe 
(6.1) Var H WED & at BRS +E nin, cre’) — |X! eri] | ° 








534 WILLIAM H. KRUSKAL 


E R‘ is readily found from formula (8) of [2], which when translated into our 
notation says 


eu os?) _ n(N — n)(N + 1) 
E (z, ni — ion ee ae 


[5Nn(N — n;) — 2n? — 2(N — n,)* + 3n(N — n,) — 2N). 





From this 
] aan N ad 1 F 9 — | — ~ 73 - 72 a 
~ ER; = 70’ ni{15N° + 15N° — 10N — 8] + n; [30N° + 50N° + 16N] 


+ [5N® + 9N* + 2N] — ~ Ns + 2N*)}, 


and, summing over 7 





F N+1 : 2 
a ER: = ~549 [15N* + 15N* — 10N — 8] Leni 
es N°(N + C(N + DN pa , 
(6.2) +— oy ) ison? + 50N + 16] + 535 — IBN + 9N + 2] 
oe l fon? + 2N ‘1x 
240 : 


Next we find E(Rj R32) as follows: 
E(RiR R3) = x a x x E(x? x? x? x? 


where i, 7’ = 1, 2, «++, mn, and j, j’ = 1, 2, «++, m. This quantity is 
ny (ny — 1)(ne — DELX{? XP? XP XP] + mim — DELXL? XP xP) 
+ nn(n. — 1) E[X2°X XP] + nn E(Xf* XP") 


ny n(n, — 1) (nm — . 4 mile (my + Me — - 
NW — hw —2W — 3) 2 PP’ 9 + HWW — NO 5 2 rag’ 


nN, Ne 
+ yw op Pe 


where the p’s and q’s run from 1 to N and within any term of a summation no 
two are equal. This simplifies after some algebraic labor; we divide by nyn2 and 
make the obvious generalization from (1, 2) to (7, 7) to find 

1 uti 


— E(R{R*) = (n, — I)(nj — 1D) = ; [15N* + 15N* — 10N — 8] 
nN; 240 


“ N+1 
360 





+ (nj +n; — 





+ 35N’ — 11N — 12] 








Wt) 


+ 180 


20N* + 24N* — 5N — 6]. 
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Sum this over i ¥ j to obtain 


oe [(V — ©) + 2N - ¢ — E nillsn’ + 15N* — 10N — 8] 


os) +4 a l orc — 1)(N — C)IBON* + 35N? — 11N — 12] 


+ N a : C(C — 1)[20N* + 24N* — 5N — 6]. 


Next, we note that 
teri = Nth wn) +n, 0+¥ + 1)’ -it 
ni 4 

which, summed over 7 and squared, gives 


y2 r 2 
(6.4) ee (ON? + 6N(C + 2) + (C + 2)°). 


Finally, substituting (6.2), (6.3), and (6.4) in (6.1), and simplifying, we obtain 
Var H 


A (nN + 2) +N 


2 
5N(N + 1) 


Note that as all the n;’s —~ ©, Var H — 2(C — 1) = Var (chi-square with C—1 
degrees of freedom). 


=2(C — 1) - 


[3c? — 6C + N(2C* — 6C + 1)] — sx 


7. The maximum value of H. It is an aid in approximating the distribution of 
H to know its maximum value. This may be obtained from the well-known 
analysis of variance algebraic identity 


te Da+ Ed ae - R,) yX(x-3 ‘3 t) 
i=l j= i=] j= o 


(7.1) , 
= — N(N* — 1). 

A sample point maximizing H is a sample point minimizing the second term on 

the left side of the above identity, that is the within sum of squares. Clearly this 


sum of squares is minimized when the ranks within each (sub) sample form con- 
secutive integers, that is X$° — R; = j — 4(n:—1), so that 


¥ > (x{° — R,)° : > nN; (nj — 1) 


t=1 j=1 12 i=] 
ani oe re 
“a . ni — N). 
Substituting back in (7.1) it follows that the maximum value of H is 


in yv— Dini ni a N* = > ni 
(7.2) (N — 1) +h NUN + 1) Wo + i) * 








ibtine a™ 


ae 


oon 


wee 


SOR 
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8. On the distribution of the R,’s and H. If we set 


T'(ry, 725 °° 5c} M1, Me, *** 4 Ne) 
(x my) 
=] ° i ° 
= ~——— Pr{R; =r, (i = 1,2,--- ,C) with stated n,’s}, 
Tai°** Rei 


then, under the null hypothesis, the following difference equation holds for I. 


(8.1) T(ri,-+:,%co3Mm, +++ 5 Ne) 


= Er(n, 29° Mi-1,% — Lo nj Fist, -2*5fo3M,°°* yNMi-1,M% — 1,Niai,°*> ne) 
i= = 
with the following boundary conditions: 

1. If any argument fails to be a non-negative integer, T = 0. 

2. If rm; = 0, but r, + n; ¥ 0, then r = 0. 

3. When nm = nm = --- = mc = 0, T = 1 when all r,’s are 0, and otherwise 
r = 0. 

The above equation and conditions follow readily from the partition of the 
chance event {R; = 7, ---, Re = rc} into the C chance events {R; = 
Re = rc, and max;’,; X‘"" isan X$"} fori = 1,2, ---,C. 

I have been unable to find a closed solution for this equation. For C = 2 and 
small n,’s, values of [ni!no!/(my+n2)!] T'(r1,r2; m1, N2) are given in [2]. (Comment 
on notation: m, n, and U of [2] correspond to our n,m, and Ry — 3 m(m+1) 
respectively. The tables of [2] actually give the cumulative probabilities.) For 
C = 3 and for small values of the n,’s, recursive computations based on (8.1) 
are being carried out in order to obtain exact distributions of H. It is hoped 
that from these exact distributions some idea of the accuracy of various approxi- 
mations may be obtained. 

In Section 6 formula (6.5) for the variance of H under the null hypothesis was 
obtained as a function of NV, C, and >> 1/n;. It seems reasonable to attempt to 
better the chi-square approximation by fitting a Type III distribution with 
density function 


eos 


a »—1 —at 


I'(v) 


for t = 0, and 0 for t < 0. Equating first and second moments 





a = (C—1)/Var H, y = (C—1)*/Var H. 


Equivalently, one may approximate Pr{H S x} under the null hypothesis by 
the use of K. Pearson’s incomplete T function tables [5], setting u and p of those 
tables equal respectively to 


x/V Var H, (C—1)*/Var H — 1. 
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Again equivalently, one can make the same approximation by interpolation in 
a chi-square table using 2(C—1)*/Var H degrees of freedom and argument 
2x4(C—1)/Var H. 

In Section 7 formula (7.2) for the maximum value of H, say Hy , was obtained 
in terms of N and >> n} . It seems reasonable to attempt to better the chi-square 
and Type III approximations by fitting an incomplete B distribution and using 
[6] to obtain approximate probabilities. Thus, equating moments again, we may 
approximate Pr{H S x} under the null hypothesis by /.)1,,(p, q), in the nota- 
tion of [6], where 


C-—1 (C—1)(Hy —C +1) —VarH - Bam Oot 
Hu Var H es eee 


The above formulas are given for convenient reference. The relative merits of 
these approximations will be discussed in [15]. 


9. The possibly discontinuous case. Much of the preceding material carries 
almost directly over to the general case in which the cdf’s need not be continuous, 
providing that the following randomization convention is followed: when two 
or more #’s are equal, define the corresponding X’s at random. More precisely, 
suppose that efi» = gi) = -ee = {~ for a given sample point, and that all 
other £’s are unequal to the common value of the above w ¢’s with (say) \ &’s 
less than the common value. Then assign ranks A+1, A+2, ---, A-+w to the tied 
’s by performing a random experiment in which each of the w! possible assign- 
ments is an equally likely outcome. With this convention the joint distribution 
of the X’s under the null hypothesis is the same as that stated in Section 2, so 
that the asymptotic chi-square distribution (Section 4) holds. 

The following minor changes would be made in Section 5: 

(1) In the discussion of the intuitive interpretation for the consistency con- 
dition, replace the given expression for gi, by gi = Pr{n® > &'?} + 
4Pr{n’” = ¢\""}, and replace the necessary and sufficient condition for incon- 
sistency by 


lim [Pr{n > 2} + 4 Pr{n = &§?}] = 3, for all 7. 
N-2o 


(2) In the discussion of consistency when F; < F,, and F, Ss F;, 
(i = 3,4, --- , C), insert the remark that the result continues to hold if we con- 
sider the cdf’s not in one of the usual senses (i.e., continuous to the left or to the 
right), but rather in the sense of Lévy: $F (x) + 4F(x*). The same interpreta- 
tion of the cdf notation should be made in the discussion of a class of cases for 
which consistency fails. 

(3) Delete the word “continuous” in the statement of Theorem 5. 

Another way to treat ties, much discussed in connection with the rank correla- 
tion coefficient, is to give tied £’s equal fractional ranks so as to keep the sum of 
ranks at its usual value; i.e., in the notation of the first paragraph of this section, 
assign the fractional rank \ + 3(w + 1) to all the w tied ¢’s. We proceed to show 
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that if we do this, and also change the norming constants appropriately, the 
altered H still is asymptotically chi-square with C — 1 degrees of freedom. 

Suppose that there are K groups of ties with, respectively, w: , w2,-** , ox 
members. We agree to use mean ranks in the tied groups and to work in the 
conditional distribution wherein just K tied groups exist of sizes w , +--+ , wx 
and covering fixed rank intervals, but permitting the numbers of observations 
from the C subsamples falling in any tied group to vary. In other words, instead 
of the finite population (1, 2, --- , N), we deal with 


1,2,-*+,1, 
+ 3w + 1), ---, A. + 43 + 0, A + ow + 1, 
Ao + Ewe + 1), «++, Ae + For + D), Az + we + 1, 


An + (wx + 1), -++, An + Fox + D, Ac tox t+1,---,N, 


where Ax + 3(we + 1) occurs a, times. Under the null hypothesis the ordered 
N-tuple of X$°’s takes as its values the permutations of the above finite popula- 
tion, all with equal probability. We compute that ER; = 3n:;(N + 1), as before, 
and that 

- _ ) — nN; 
time n(N —n)(N +1) _ _ ni(N mi) 5 1 


3 NOV D 2 9 Her — Dew + D, 


Cov (R,, RB) + SAY. ee yt (wn — 1)(un + 1) 
Tr 12 —-...°”hC~;C;7C Sse 


so that, setting 


= dX we(ax — 1)(we + 1), 
=1 


we have 
El(R; — ER,) (R,- — ER;.)] = 75 Im Nac — nM; Va 
or the corresponding second moment in the untied case times 
[N° — N — y)/[N’ — N)}. 


Now let the d,’s, the «,’s and the n,’s all be functions of N, and assume that 
limy.. 2:/N = v; > 0 exists, limy.. >/N* = 7* exists and 7* ¥ 1. To say that 
y* * 1is to say that Max, «,/N does not approach 1, and one can readily show 
then that the sequence of finite populations (9.1) satisfies condition W of The- 
orem 4W. It follows from Theorem 4W that the variables 


R, mde 
V/ 12 
ie 4 
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are asymptotically multivariate normal with zero mean and covariance matrix 
given by (4.1). Hence, just as in Section 4 


12 c 


a, & 1 N+1¥ 
9.2 ae ae = 2, — * 
( ) N (N + Dd} -_ wre | ee (z 7 2 ) H 


(say) is asymptotically chi-square with C — 1 degrees of freedom and has ex- 
pected value C — 1. Note that no limit condition on the ),’s is needed. 

10. Further work. It would be interesting to investigate further the power 
function of the test described in this paper, perhaps along the lines of [1], or by 
considering its asymptotic relative efficiency to ordinary one-way analysis of 
variance in the normal case. Again in the spirit of [1], it would seem desirable 
to propose and investigate related tests specifically designed to be powerful 
against more restricted classes of alternatives, e.g., fF; 2 F2 = --- 2 Fe, with 
at least one inequality strong.* Another extension is to consider the use of H-like 
tests in two-way analyses of variance or more general linear hypothesis situa- 
tions, in a manner analogous to that of [4]. 
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TESTING MULTIPARAMETER HYPOTHESES' 


By E. L. Leamann 
Stanford University and University of California, Berkeley 


1. Summary. Let the distribution of some random variables depend on real 
parameters 6; , --- , 9, and consider the hypothesis H: 6; S 6;*,7 = 1, ---, 8. 
It is shown under certain regularity assumptions that unbiased tests of H do not 
exist. Tests of minimum bias and other types of minimax tests are derived under 
suitable monotonicity conditions. Certain related multidecision problems are 
discussed and two-sided hypotheses are considered very briefly. 


2. Introduction. The extensive literature on optimum tests has been concerned 
mainly with hypotheses specifying a set of values for a single real valued param- 
eter. Important exceptions are some cases that can be reduced to the one-param- 
eter situation by the principle of invariance, such as the linear (univariate) 
hypothesis and Hotelling’s T’-problem. These have been used to illustrate a 
number of different principles, the successful application of which however seems 
to rest on the symmetry whose full exploitation makes the problems unipara- 
metric. Another exception is the theory of tests with local optimum properties, in- 
itiated by Neyman and Pearson [1] and recently developed further by Isaacson [2]. 

We shall here concern ourselves mainly with hypotheses which, rather than 
specifying the values of the parameter in question, state that these parameters 
do not exceed certain bounds. The following examples illustrate the way in 
which such problems arise. 

Example 2.1. Let p and p’ denote the number of major and minor defects in a 
lot. Then the lot will be considered acceptable provided p S po and p’ S po, 
where po < po. 

Example 2.2. It may be desired to compare some new treatments with a 
standard one. Here the hypothesis would specify that none of the new treat- 
ments is better by more than a given amount than the standard. 

Example 2.3. Let x; , --- , 2, be a sample from a normal distribution with 
mean ¢ and variance o’. The population in question may be considered adequate 
iféShandosSm. 

In some of the above examples we are dealing with bona fide testing problems 
while in others we are faced with a choice among more than two decisions. Which 
of these is the case cannot always be seen from the mathematical formulation 
alone. Thus in Example 2.1 it clearly depends on the disposition that is made of 
a rejected lot. If there is complete screening, the reason for rejection is immater- 
ial. If on the other hand a lot rejected for major defects is treated differently 


from one rejected only for minor defects the decision problem becomes more 
complicated. | 


1 Work done under the sponsorship of the Office of Naval Research. 
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We shall in the following concern ourselves mainly with the one-sided case 
of the straightforward testing problem. The two-sided situation and the multi- 
decision problem will be discussed only rather briefly. For simplicity we shall 
take the number of parameters to be two. The extension to the higher dimen- 
sional cases is immediate. 


3. Unbiasedness and the minimax principle. The success of the concept of 
unbiasedness in the one-parameter case suggests the use of this approach also 
for the present problems. Unfortunately it turns out that in general unbiased 
tests of the hypotheses in question do not exist. Let us consider the case of two 
parameters 6, , 6. and the hypothesis H:6, < 67, 6: S 63. We shall assume that 
the power function 8(@; , 62) of any test is analytic in 6; and 6 in the sense that 
it can be expanded in an absolutely convergent double power series. Then we 
shall show that for any unbiased test we have 8(6, , 62) = a, so that any unbiased 
test is equivalent to the trivial one that rejects with probability a regardless of 
the observations. This incidentally proves this trivial and most unsatisfactory 
test to be admissible for the problem under consideration. 

Without loss of generality assume that 6] = 6? = 0. Then unbiasedness 
states that 8(6;, 62) S a in the third quadrant of the 6; , 6.-plane, and 2 a in 
the other three quadrants. By continuity we have 8(@: , 0) = a for 6; < 0 and 
hence by analyticity 8(@,,0) = a for all 6,. Analogously 8(0 , 62) = a. Consider 
now 8(6;, 62) for any fixed 6. > 0 as a function of 6, . It has a minimum at 6; = 0 
so that 08(6; , 62)/06; | 6,-0 = Ofor all 6 = 0. Since 08(6; , 2)/06: | 6, ~ 0 is again 
analytic in 6 , it follows that (08(@; , 62/00:) | », ~ 0 is identically zero. Consider 
now 6(6; , 62) for some fixed value @ S 0. Since 8(; , 62) 2 a as 6; = 0 and since 
at 6, = 0 the derivative is zero, 8(@, , 62) must have a point of inflection at 0 
and consequently the second derivative 8°8(6:, @2)/00; | «6, — o = O for all 4 
< Oand hence for all 6 . Since the, order of the first non-vanishing derivative 
a*B(A; , 6) /A6; | ¢,.<0 must be even for 6. > 0 and odd for 62 < 0 we see in this 
manner that for any fixed 62 9 8(0;, 02)"/a6; | o,<0 = Ofor allk = 1,2, ---. By 
analyticity it follows for each fixed @ that 8(6; , 62) must be a constant, that is, 
be independent of 6; . By symmetry it now follows that 8(@; , 62) must be iden- 
tically constant, as was to be proved. 

We digress for a moment from our search for a reasonable test of the hypothe- 
sis 6, , 6. S 0 to point out that there do exist non-trivial tests of H satisfying 
the condition of similarity 


B((,,0) = 80, &) =a 


for all 6, , 6. Suppose for example that X and Y are independently distri- 
buted with joint density fs,(x)fe,(y) and that a = 1/m where m is an integer. 
Then we can obtain a particularly simple class of similar regions as follows. Let 
S,, +--+: , S,, be mutually exclusive and exhaustive sets on the real line such that 


| helz) ar = a, i=1,---, 
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In the x, y-plane define the set w; to be the Cartesian product of S; with itself, 
and let w = w, + --- + wm. Now when @ = 0, X is a sufficient statistic for 
6, and for every x we have 


Po,0((X, Y)e W| x) =a, 


and hence P¢,0((X, Y) « W) = a. 

We now return to the original problem, and investigate the test that max- 
imizes the minimum power over a certain class w’ of alternatives. For w’ we 
take the set of points (@, , 62) for which either 0, = 67* or 6. = 63*. Let us con- 
sider first as an example the case that X and Y are independently, normally 
distributed, with known variances and with means 6; and 6, respectively. 
Then it would seem as if any reasonable test should satisfy the following two 
conditions: 

(i) B(@;, @) S 8(6; , 62) whenever 6, S 0;, 62 < 6:, if @ denotes the critical 
function, 

(ii) o(z, y) S o(2’, y’) whenever z S 2’, y S y/’. It is easily seen that (ii) 
implies (i); we shall now show that the test ¢ that maximizes inf w’ 8(0; , 62) does 
not possess property (i) and hence also not (ii), provided 6f* — 67 and 6?* —é? 
are sufficiently large so that ¢ is not identically equal to a. Let 8 denote the 
power function of ¢ and suppose that inf w’ 8(@, , 0) = y > a. Then condition 
(i) implies that under the hypothesis 8(@, , 62) = a only when 6, = 67 and @ = 
63. For if 8(@, , 6) = a also for some other point in H, it would also equal a on 
the line segment connecting these two points and hence by analyticity on the 
whole line containing this segment. But this would imply that 8(6; , 62) = a also 
for points in w’ where by assumption (6; , 6.) 2 y > a. Another consequence 
of condition (i) is that 8(@; , 62) > y for all points in w’ so that the minimum point 
7 is never attained in w’ and is approached only as either 6, or 6. tend to — . 
For if, for example, 8(6; , 62) = y for some point with 6; = 63* and finite 6; 
we would have 8(@; , 6:) = y for all 6; S 6; and hence for all 6, . This would 
imply 8(6,, 6) = y for all (@,, 62) with 0, = 07* , 6. S 6; and therefore 8(6, , 62) 
= y¥. 

From these two remarks and Theorem 3.10 of Wald’s book Statistical Decision 
Functions [3], it can be shown that there exists a sequence A; of probability dis- 
tributions over w’ with the following properties: (a) For any real number A 
the probability under A; of the intersection of w’ with the quadrant 
{0, , 02 | @ , @ 2 A} tends to zero as i > ~. (b) The power of the most power- 
ful level a test for testing H’:0, = 67, 6: = 62 against the simple alternative 


/ Pe,.0,(t, yd X,(O; , 9) tends to y as i — &. But from (a) it follows easily 


that asi — ~, / Pe,.6, (, y)d X,(0,, 4) can be distinguished arbitrarily well 


from p¢,*.,* (x, y). This leads to the contradiction y = 1. 
We have given the proof explicitly only for the case of independent normal 
variables. However it applies equally well to any problem in which, in addition 
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to the analyticity assumption of the present section, also the assumptions of 
Theorem 4.1 are satisfied. 


4. Monotone regions. In the previous section we showed that for the hypothe- 
ses under consideration neither unbiasedness nor the minimax principle lead to 
desirable results. In order to arrive at a reasonable test we now impose the fol- 
lowing preliminary conditions suggested by the negative results of the last sec- 
tion. We ask first that the test be nonrandomized, so that we can speak of a 
region w of rejection. The second restriction is one of monotonicity. Let us assume 
that we are concerned with two random variables X and Y whose joint distribu- 
tion is given by pe,.s,(z, y). Then we shall say that the region w of rejection for 
the hypothesis H:6,; S$ 62, 0 < 6: is monotone (nondecreasing in z and y) if 


(4.1) (z,yleu,rs2,ysy imply (v,y') eu, 


that is, if its critical function is nondecreasing in both variables. 

The restriction to monotone regions is of course suitable only in certain prob- 
lems, namely, roughly speaking, when increased values of the parameters lead 
to higher values for the observations. To make this precise let 6, S 01’, 02 S 62’ 
and let F and G be the cumulative distribution functions of X and Y correspond- 
ing to (@ , 6) and (6,’, 4"), respectively. Then we shall consider the condition of 
monotonicity appropriate provided for every monotone non-decreasing region w 


(4.2) [ dF < [ dG. 


Frequently the simplest way to prove (4.2) is to establish the existence of 
functions x’ = f(z, y), y’ = g(a, y) with 2’ 2 x, y’ 2 y and such that when F 
is the cumulative distribution function of (X, Y), that of (X’, Y’) is G. Sometimes 
it is more convenient instead to prove the existence of random variables, say 
Z,,°:*, 2, and functions X = f(Z,, --- ,Z,), Y g(Z,, °°: , 2), X’ = 


f'(Z1, °°: , 2), Y’ = g'(Z, +--+ , S) such that X s X’, Y & Y’ and the 


edf’s of (X, Y) and (X’, Y’) are F and G respectively. Both of these conditions 
clearly assure the validity of (4.2) since for any w that is nondecreasing in x and 
y they imply 


(4.3) [ dF = P((X, Y) ew) S P(X’, Y’) ew) = [ dG. 
w w 

A remark is required also in connection with the restriction to nonrandomized 
tests. When dealing with discrete problems, for example binomial distributions, 
we must permit a certain rather trivial kind of randomization. A formal way of 
handling the distinction is provided by a representation of randomized tests due 
to M. Eudey [4]. Let X denote the number of successes in n binomial trials, and 
let U be uniformly distributed over [0, 1]. Then any randomized test in X is 
equivalent to a non-randomized test in X + U, and we shall consider monotone 
non-randomized tests in the continualized variables X + U, Y + V. Here mono- 
tonicity insures that no very heavy use is made of randomization. In fact, in 








TESTING MULTIPARAMETER HYPOTHESES 545 


the original variables X, Y randomization will occur only on the boundary of 
the critical region. 

We shall now derive that test of the hypothesis H:6, S 67, 6 S 6? that among 
all monotone regions maximizes inf,. 8(@; , 6.) where w’ is the set of points 
(0; , 0) with 6, = 6T* or 6 = 03*. Here 67* = 6F and 63* = oF. It may be of 
interest to point out that if we let 6f* = 6f, 0F* = 67, we get the monotone 
region with minimum bias. 

THEOREM 4.1. Let the joint density of X and Y be pe,»,(x, y) where the param- 
eter-space is a finite or infinite open rectangle 6; < 0; < 6, , 02 < 0 < 62, and 
the positive sample space also is an open rectangle z < x < £,y < y < § indepen- 
dent of the @’s. Suppose that (4.2) holds, that the marginal distribution of X depends 
only on 6, , and that of Y only on @ , and that X tends in probability to z as 6, — 
6, , while Y tends to y as 6, — 6. Then the test that among all monotone non- 
randomized tests of H maximizes the minimum power against w’ is given by the 
region of acceptance S: 


(4.4) 

where a and b are determined by the conditions 

(4.5) P(X sa,Y Sb| 67,62) =1—a 
and 


(4.6) P(X sa| 6i*) = P(Y s b| 62"). 


Proor. We point out first that for any z > z,y > y 


(4.7) lim P(X S2,Y Sy|,6) = P(Y S y | 6) 


6:81 


lm P(X s2,Y S y| 6,6.) = P(X S|). 


62-62 
P(X s2,Y Sy) =P(X s2z)- P(X $2,Y2y) 
Os P(X S2,Y>y) 8 P(Y > y) 


lim P(Y > y| 6) = 0. 
62-62 
For any monotone test the limit of the power 8(6; , 62) as 6; — @; clearly exists 
and we shall denote it by 8(@, , 62). The minimum power in w’ is then the smaller 
of the two quantities 8(@,, 67*) and 8(67*, 4). Since for the test given by 
(4.5) we have 8(@, , 62*) = 8(@T*, 6.) we could, if the theorem were false, in- 
crease both 8(@, , 62") and B(6T*, 6). 
Clearly any monotone test has a region of acceptance S’ of the form y S g(z) 
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or equivalently x < h(y) where g and h are non-increasing. If S’ # S is to be of 
size S a we have P(S’ | 6, 67) = P(S | of, 62) and hence either lim, g(x) > b 
or lim,.,i(y) > a. Suppose that the first of these conditions is satisfied and let 
us denote the complement of S and S’ by S and S’, respectively. Then there 
exists a constant k such that x = k for all points in S A S’, and a subset S” of 
SA S’ suchthat P(S’ | 6; ,02) > O for all 6, , 6. and z S k for all points in 8”. 
Hence by (4.7) 
P(S A S’|&, 62*) =0 

and 


P(S A S’|6,,02*) = PbS YS lim g(z) | oF*) > 0, 


zr 


which leads to 
P(S' | 0, 62") > P(S| 4%, 62°), 


and hence to the desired result. 

The theorem becomes particularly simple if the joint density of X and Y is 
symmetric in its two variables when 6; = 6. For then if 6, = 6: = 0*, 07* = 
6:* = 6**, it is seen from (4.6) that a = b. Thus the test accepts if max(X, Y) 
< a where a is determined by (4.5), and hence is independent of 6**. 

The assumptions made in Theorem 4.1 concerning the shape of the parameter 
and sample spaces are unnecessarily restrictive. The theorem remains valid if 
we assume that both the parameter space and the positive sample space are 
convex open sets. The proof is essentially the same, however the notation 
becomes considerably more complicated. 

If the roles of hypothesis and class of alternatives are interchanged, we obtain 

THEoreEM 4.2. For testing the hypothesis H’:6, = 6;* or 0 = 02* against the 
class of alternatives w:6, S 67, 02 < 62, let S be the region of rejection x S c, 
§ y Sd, where c and d are determined by 


(4.8) P(X Ss c| 0{*) = P(Y Sd| 62*) =a. 


etter ed 


pee 


Then under the assumptions of Theorem 4.1 S is uniformly most powerful among 
all regions of rejection that are monotone non-increasing in both variables. 
Proor. Consider any monotone region given by x S g(y) or y S A(x) with 
g and h non-increasing. Since the probability of rejection must be not greater 
than a at (67*, 4) and (6,, 62*) we must have 
(4.9) lim g(x) Sc, lim A(y) S d. 
y~oyv 


z—-r 


But any monotone region satisfying (4.9) is contained in S and hence is uniformly 
less powerful than S. 


5. Examples. In the present section we shall apply Theorem 4.1 to some specific 
problems. All other assumptions being trivially satisfied we shall in each case 
only check condition (4.2). 
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Example 5.1. Let X, Y have a multinomial distribution 


(4.10) P(X =2,Y =y) =—| — 61 02(1 — 0, — 62)” *”. 
riyi(n — 2x — y)! 

To see that (4.2) holds let Z; , --- , Z, be uniformly distributed on (0, 1) and 
let U, V denote the number of Z’s ‘n the intervals (0, 6) and (1 — @, 1) re- 
spectively. If U’, V’ are defined ulogously with respect to 6{, 6; it is seen 
that (U, V) has the distribution (4.10) while (U’, V’) has the corresponding 
distribution for 6; , 6; . Since U S$ U’, V S V’ the validity of (4.2) follows. 

The same proof works of course in the case that X and Y are independently 
distributed, each according to a binomial distribution. 

Example 5.2. Let X,,--- , X, be independently and normally distributed 
with mean ~ and variance o’, and consider the hypothesis H:t S h ,¢ S a. 
Since X and S* = >}>(X; — X)’ are sufficient for ¢ and o, we may restrict atten 
tion to these statistics. However, if we try toset X = X,Y = S°,6,=§, 0 = 0 
we encounter certain difficulties. First the distribution of X does not depend 
only on 6; as we require in Theorem 4.1. While this is not a very important con- 
dition of the theorem, a second consideration shows that it is impossible to apply 
the monotonicity restriction at all to the present set-up. For the joint cumula- 
tive distribution function of X and S does not satisfy condition (4.2). 

This exhibits an unpleasant feature of the present approach. In a given prob- 
lem it is not known a priori whether there will exist variables X, Y and a choice 
of the parameters 6; , 6 so that (4.2) will be satisfied. On the other hand, when 
such variables and parameters have been found, it is not clear that these are 
the only possible choices. While it would of course be interesting to investigate 
existence and uniqueness questions, the monotonicity condition is an extraneous 
restriction anyway, whose suitability must be judged for each problem in terms 
of the choices for X, Y, 6, and @. 

In the present case we may take X = (X — &)/S, Y = S*, @ = (E—&)/o 
and 6 = o. To check condition (4.2) assume without loss of generality that 
& = Oand let 0 = t/¢ < #/o’ = O,0<o. Ifo’ =k? = kt +0, let Xj = 
kX; + c, so that S’ = kS, X’ = kX +c. Since k > 1 and ¢ <#’/k we see that 
c > Oand X’ > X, Y’ > Y, so that (4.2) follows. 

As a last problem let us consider one in which nuisance parameters are present. 

Example 5.3. Let Xi, +--+, Xm; Y1,°**, Yy be independently normally 
distributed with common variance o’; let E(X;) = &, E(Y;) = n, and:consider 
the hypothesis H:¢ < & , 7 S m. This time X, Y, and S’ = >o(X; — X)?+ 
>(Y; — ¥)’ from a set of sufficient statistics, and again the question arises how 
to choose X, Y, 6, , and 6, . Here the principle of invariance (see [5]) leads to a 
solution very simply. If one rewrites the hypothesis in the form: (§ — &)/o S 0, 
(n — m)/o < 0 itisseen that X = (X — &)/S, Y = (¥ — m)/S constitute a 
maximal invariant under a suitable group of transformations. The corresponding 
parameter invariants are of course 6; = ( — &)/o, 62 = (n — m)/e. 
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It remains, once more, to check (4.2). For this purpose let £, 7, &, n’, « be 
numbers such that 


g 


= 6, 7 =, = 6, 
o oC 


Q [oer 
ais, 


and let U;, --- , Um; Vi, -°++ , Vn be independently normally distributed with 
common variance o and means E(U;) = &, E(V;) = 9. If T? = dS: — 0)? + 
DV; — V)’ the joint distribution of U = 0/T,V = V/T is that corresponding 
to (0,, 0). Let?’ — & = ¢, »' — 9 = dand let U; = U; +¢,V; = V5 +d. 
Then U’ = U’/T, V’ = V’/T have the joint distribution corresponding to 
(0; , 0). Also U’ = 0+e/T > Uand V’ =V+¢e/T > V. 

Exactly analogously we can treat the hypothesis H’:§/o S y, n/o S 6, and 
the corresponding problems in which the two variances are not assumed tu be 
equal. 


6. A multidecision problem. As was pointed out in the introduction, some of 
the problems considered here really involve the choice between more than two 
decisions. We shall now indicate, by discussing an example, one method of treat- 
ing such multidecision problems through successive reduction to problems of the 
simpler type. 

Let us once more consider the hypothesis H : 6; , 62 < 0 and let us assume 
that in case of rejection we wish to decide whether 6 > 0 > 4,6: > 0 > 4 
or whether 6; and 6, are both > 0*. Let us denote these three regions of the para- 
meter space by a , w., and w; and the associated decisions by d; , dz, ds. The 
set 6,, 0. < 0 will be denoted by w and the associated decision of accepting H 
by do. 

We shall assume that each of the four pairs of random variables (+X, +Y) 
is monotone with respect to the corresponding pair of parameters (+4, , +62) 
and that pe, » (x, y) is symmetric in xz and y so that the region of acceptance is 
given by 


(6.1) max(z, y) S a. 


We must now consider how to divide up the complementary region between 
d, , d, , and d; . Here we again impose the natural monotonicity restrictions. We 
ask that the region for d,; be monotone non-increasing in x and non-decreasing 
in y, and that the analogous conditions be satisfied by the regions for d, and d; . 
Suppose the problem concerns a standard treatment and two new ones, where 
6, and 6. measure in some way the differences between the new treatments and 
the standard. The circumstances are such that the most serious error consists 
in incorrectly rejecting the standard treatment in favor of one of the others. By 
proper choice of a this probability is controlled so that it is not greater than a 
for all (0; , 02) € wo. 

Next in importance seems to come the possibility of reaching decision d2 in 


* A similar multidecision problem has recently been treated by Paulson [6] from a some- 
what different point of view. 
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w Or d; in w. , but we shall set these aside for the moment. The next important 
error presumably consists in deciding on d; when the parameter point lies either 
in w; Or w.. This error can be controlled in the usual manner by making the d;- 


region sufficiently small. Thus we may select a number 0 < 8 < 1 and impose 
bounds 


(6.2) P(d; | w) & B, P(d; | w) S 8. 


Subject to these conditions we wish to maximize P(d;) in w; . Let us now restrict 
attention to d;-regions of the type y 2 g(x) or x 2 hA(y) with g and A non-increas- 
ing. Then it is seen by the argument used to prove Theorem 4.2 that among all 
monotone d;-regions satisfying (6.2) there exists one that uniformly maximizes 
P(ds) over w; . If P(X > a| 6 = 0) S 8B it consists of the points satisfying either 
z2a,y 2 borz 2 b,y = a where a, b are determined by 


P(X $a,Y $a|=0=0)=1-a, 
P(X > b| 6 =0) =8. 


If on the other hand P(X > a| 6, = 0) 2 8 the optimum d;-region is given by 
re2by2b. 

Let the remainder of the sample space be divided up symmetrically in the ob- 
vious manner between d; and d; . It then follows from monotonicity that P(d; | w.) 
and P(d; |) both take on their maximum value at @, = 6 = 0. Hence 
P(d; | w) S 4a, P(d2 | w:) S $a, so that these errors also are controlled in a 
satisfactory manner. 


7. Convex regions.’ If we try to apply the results of Section 4 to specific ex- 
amples, we occasionally find an obstacle in the condition that X and Y should 
tend in probability to z and yas 6,—> and 6 — @, respectively. We shall 
now show that by restricting the acceptance region to be convex as well as mono- 
tone we can prove a result analogous to Theorem 4.1 without assuming de- 
generacy of the distribution at 6; and 6. 

Let us consider again the joint density pp,, (7, y) satisfying (4.2), the hy- 
pothesis H:6, S 67 , 6 < 63 and the set of alternatives 6, = 67* or 6. = 62". 
Putting 


Dot 03(2, Y) 


of ’ ) = ee, © 
rene Tpo,,03°(2, y) + (1 — 2) pos? og (2, y) 





we shall assume: 
(i) For any 0 < ~ < 1 and any C the region 


(7.1) r(z,y)2C 


is convex and non-decreasing in x and y. 


‘ 





3 I am indebted to the referee for several valuable suggestions with regard to this sec- 
tion. 
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(ii) Given any two points (2’, y’) and (x”, y”) with 2’ < x”, y’ > y” there 
exists 0 < m < 1 and C such that both points lie on the boundary of the set of 
points (x, y) satisfying (7.1). 

(iii) pe, ., (2, y) > O for all 6, , 6, , x, y under consideration. 

Consider now the set of alternatives w':0, = 6, 0: = 02* or 6 = 0,02 
6;* where 6, and 6. may now be any numbers less than 67 and 63 , respectively. 
Let a and b be determined by 


P(X Sa,Y Sb| 6,63) =1—-a, 


P(X Sa,Y $b|%, 62") = P(X Sa,Y < b| 6", 6). 


Then the acceptance region S: x S a, y S b maximizes inf,,. 8(0; , 6.) among all 
monotone and convex level a tests. 

For let S’ be any other acceptance region satisfying the conditions that have 
been imposed. The boundary curves of S and S’ have in common either one or 


Fie. 1. 


two points or an interval. Let us consider the case of two points, say (2’, y’) and 
(x’”’, y’’). We may then assume x’ < 2”, y’ > y’’ so that there exist r and C such 
that the boundary of (7.1) passes through these two points. From (i) it follows 
that r(x, y) = C forall points in SAS’ and < C in S A'S’. Since we have P(S’) S 
P(S) = a when the density of X, Y is pot.63 (x, y) it follows from the funda- 
mental lemma of Neyman and Pearson that P(S’) > P(S) when the density is 
given by xpet 3" (x, y) + (1 — x) pot, (x, y). It is therefore impossible that 
P(S’) = P(S) for both (0, , 62*) and (62* , @) as was to be proved. 

The same argument applies also in the case that the boundaries of S and S’ 
have an interval in common. Consider finally the case of one common point, 
say (a, yo). For each n let (x, , C,,) satisfy (ii) for (—n, b) and (a, yo). Then 


{ pei.03(@, Yo) pei.03(a, yo) | 


0<C, S max e ; 
\pe,.63°(@, Yo) pai" 8,(4, Yo)! 


so that there isa subsequence of {(, , C,,)} which converges, say, r, — 7*, C, > 
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C*. It is easily seen from the monotonicity and convexity of the regions r,, (x, y) = 
C,, that the boundary of r,- (x, y) 2 C* is the line y = y . The remainder of the 
argument is completely analogous to the two point case. 

As an example let 


(7.2) P6, 82 (x, y) _ C(O; ’ 62) pom h(x, y)s 


and assume that (4.2) holds. Suppose that a priori lower bounds @; , 92 are given 
for 6; , @ such that pe,¢, (x, y) is again a density. If we let 6r* = of, 0** = 67, 
the region (7.1) becomes 


a ertvz + b ets fay < k 


and conditions (i), (ii), and (iii) are easily checked. 


8. Two-sided problems. We shall discuss only one rather trivial two-sided 
problem, which is enough however to indicate that the type of result one obtains 
here is entirely different from what we found in the one-sided case. 

Let X and Y be independently, normally distributed with unit variance and 
means £ and 7, respectively, and consider the hypothesis H: = » = 0. We shall 
determine the test ¢ that maximizes inf,. 8(, 7) where w’ is the set of points for 
which either | — | or | 7 | is 2 y, (y > 0). Any reasonable test for this problem 
would presumably attain its minimum power over w’ at the four points (0, y), 
(0, — v), (vy, 0), (—v, 0). We therefore expect ¢ to be the most powerful test of 
H against the simple alternative that assigns probability 44 to each of these 4 
points. The region of acceptance for this problem is given by S: 


e”™ 4 e™ rm e”™ + e™ < k. 


It is easily checked that this has the following properties: 

(i) S is convex. 

(ii) f0 << 2’ S27,0 Sy’ S yand (2, y) € S, the point (z’, y’) also lies in S. 

(iii) For any fixed the probability of S decreases with | & | and for fixed £ 
decreases in | 7 |. 

From (iii) it follows that ¢ is the test we are looking for, and it seems to be 
entirely satisfactory. In fact, if we utilize the symmetry of the situation to re- 
duce the variables to | X |, | Y | and the parameters to | |, | 7 | we are faced es- 
sentially with a one-sided situation and it is seen from (i) and (ii) that the ac- 
ceptance region, when interpreted in this way, is both monotone and convex. 


9. A general concept of monotonicity. In Sections 4-6 we made use of the 
notion of monotonicity, and we shall conclude this paper by indicating how this 
concept may be extended to the general decision problem. 

Suppose that there is defined a partial ordering < in the sample space and a 
partial ordering < in the parameter-space. In analogy to condition (4.2) we shall 
assume that if W is any monotone non-decreasing region in the sample space 
we have for any two parameter-points @ < 6':P.(W) < Pe.(W). 








. 
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Suppose that there is also defined a partial ordering in the decision space, 
to be denoted by =. We shall assume that the loss function W satisfies the con- 
ditions: 

(i) dy & d, S ds and W(6, d:) < W(6, dz) implies W(6, d2) < W(0, ds), 


(ii) 0: S 6 < 6; and W(6,, d) < W(62, d) implies W(é@., d) < W(0;, d). Under 
these assumptions it seems natural to restrict consideration to monotone de- 
cision functions, where we shall call 6 monotone if x S x’ implies 6(r) & 6(z’). 
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IMPARTIAL DECISION RULES AND SUFFICIENT STATISTICS! 


By Racuu Ras BaHapur AND Leo A. GoopMAN 
The University of Chicago 


Summary. A class of decision problems concerning k populations was considered 
in [1] and it was shown that a particular decision rule is the uniformly best 
‘impartial’ decision rule for many problems of this class. The present paper 
provides certain improvements of this result. The authors define impartiality 
in terms of permutations of the k samples rather than in terms of the k ordered 
values of an arbitrarily chosen real-valued statistic as in the earlier paper. 
They point out that (under conditions which are satisfied in the standard cases 
of k independent samples of equal size) if the same function is a sufficient sta- 
tistic for each of the k samples then the conditional expectation of an impartial 
decision rule given the k sufficient statistics is also an impartial decision rule. A 
characterization of impartial decision rules is given which relates the present 
definition of impartiality with the one adopted in [1]. These results, together 
with Theorem 1 of [1], yield the desired improvements. The argument indicated 
here is illustrated by application to a special case. 


1. Introduction. Let 7, 72, --- , 7, be a given set of populations and let the 
distribution function of a single observation x; from 7; be 


(1) Pr(z; S z) = Gz, 4) (— <2z< oo), 


where 6, is an unknown parameter (not necessarily real-valued), 7 = 1,2, --- , k. 
Write w = (0, 62, +++, 0%) and let Q be the set of all points w which are re- 
garded as being possible in the given case. Suppose that n independent observa- 
tions are drawn from z;, giving the sample (ra, t2,°-+, Zin) = Uy say, 7 = 
1, 2,---, k, and let the combined sample point (uw, u2,--- , ux) be denoted 
by v. Let d(v) = (pi(v), pe(v), «++ , pe(v)) be an ordered set of functions p; of 
the combined sample point v such that 


k 
(2) 0 < pv) <1, X pir) =1 


for all v. Then d(v) is said to be a decision rule. The statistical problems which 
motivate this definition may be described as follows. 
Suppose that it is desired to determine appropriate sampling rates p:, p2, 
-+ , Pe for 71, m2, °°, me, respectively, p; being the relative proportion of 
zx’s which will be drawn in future from 7; , (0 S p; $ 1, ips = 1). For example, 
the given populations may be k varieties of grain, x; the yield (bushels per acre 
or dollars profit per acre) from 7; , and p; the proportion of the available land 
on which 7; is to be grown, 7 = 1, 2,---, k. Again, m, m2, -+--, m, may be 


1 This paper was prepared in connection with research sponsored by the Office of Naval 
Research. 
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sources of a manufactured article, x; the relevant quality characteristic (e.g., 
number of hours of service) of an article supplied by #;, and pi, pe, --* , De 
the relative proportions in which a consumer obtains the articles he needs 
from ™, ™2,°°*, ™:, respectively. The mixed population obtained by using a 


given set d° = (p}, p2,-°--, pe) of sampling rates is characterized by the dis- 
tribution function 


k 
(3) G(z|w, d°) = XL rGte, 4;) (-x2x <z< ~), 


where the component distribution functions G(z, 6;) are given by (1), and the 
object is to determine d’ in such a way that G(z | w, d’) has properties which are 
desirable in the given case. (For instance, it may be desirable to minimize 
G(a| w, d’), where a is a given constant, or to maximize G(b + €|w, d°) — 
G(b — €|w, d°’) where b and « > 0 are given constants.) If the parameters 6; 
were known, an appropriate d’ could (presumably) be determined, but otherwise 
the statistician must resort to sampling the populations and take d° to be a 
function of the sample values. If samples of fixed size n are drawn from each 
population, we see that the statistician will be using a decision rule, say d(v) = 
(pilv), peo(v), «++ , pe(v)). The expected distribution function of the mixed popula- 
tion obtained by using the rule d(v) is (ef. (3)) 


H(z | w, 4) = ElG(z | w, d(v)) | o] 
(4) k 
- 2 G(z, 6;) Elp;(v) | w) (— ocz< 0), 


where E[p;(v) | w] denotes the expected value of p,(v) when the true parameter 
point is w. The statistician’s problem is to construct a decision rule d*(v) such that 
H(z |, d*) has properties which are desirable in the given case. 

A special version of applications of this type is the following. For brevity, 
write G;(z) = G(z, 6;) and let \(G@) be a real-valued functional on the distribution 


functions G;,(z), for example, \(G) = [ z dG(z) or A/G) = G(b + «) — G(b-— 


¢), where b and e > O are given constants. Writing A; = (G;), suppose that it 
is desired to select a population, z, say, from the given set 7, 72, °-* , 7 Such 
that A, = max{d;, Ae, --- , Ax}. Since the G,’s are unknown, it will in general be 
impossible to effect a (or the) correct selection with certainty, but if it is agreed 
to make the selection depend on the outcome of drawing samples of size n from 
each population, the most general selection procedure is to use a decision rule, 
d(v) = (p,(v), po(v), --+, pe(v)) say, in the following manner: given v, the 
statistician performs a random experiment whose outcome p takes on only the 
values 1, 2, --- , k with 


Pr(p = 7) = p,(v) i= 1,2,---,k) 
and selects 7, . The probabilities of selecting 7, 72, --- , 7 are then 


(5) Elp,(v) | «], Elpe(v) | o], --- , Elpe(v) | o), 
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respectively. The problem in such a case might be to construct a decision rule 
d*(v) such that the probability of a correct selection in using d*(v) is “as large 
as possible.” 

In view of the applications (cf. (4), (5)) we shall say that two decision 
rules d(v) = (p,(v), --- , pe(v)) and d’(v) = (pi(v), --- , pe(v)) are equivalent if 
Elp(v) | w] = Elp.(v) |] for i = 1, 2, --- , k and all w in Q. 

We shall concern ourselves primarily with a class of decision rules which seems 
to be of interest on intuitive grounds. This is the class of impartial decision 
rules (see [1], [2]). Let us consider the case k = 2. Then a decision rule d(v) is 
said to be impartial if d(wu , ue) = (a, 8) implies d(uz, u,) = (8, a). In other 
words, d(v) = (p;(v), pe(v)) is an impartial decision rule if pi(w, m) = 
P(t , Uz), Po(U2, U1) = pil, Ue) for allv = (wu, we). In the general case, a 
decision rule d(v) = (pi(v), po(v), --- , pe(v)) is said to be impartial if for any 
v = (uw, Ue,--*, u&) and any permutation 7; 7, --- % of 1 23 --- k we have 
Pi (us, , Pig **?'g Ui,) woe (um, Us, *** » Ux) for j = 1,2,---, k. 

The main result of this paper (Theorem 2) applies to cases whose essential 
feature is the existence of a function s(u), not necessarily real-valued, on n 
dimensional sample space such that s; = s(u;) is a sufficient statistic for 4, 
(¢ = 1, 2, --- , k), and such that the conditional distribution of u,; with s; fixed 
equal to c is the same for each 7. (The necessary conditions are always satisfied 
if, for example, the k populations are all (i) normal, or (ii) rectangular, or (iii) 
exponential, or (iv) binomial, or (v) of Poisson type.) Then t(v) = (s;, 82, -+* , 8) 
is a sufficient statistic for w, and it is clear, upon taking conditional expecta- 
tions, that corresponding to any decision rule d(v) there exists a decision rule 
d*(t(v)) which is equivalent to d(v) (Theorem 1). It is not immediately obvious, 
however, that if d(v) is an impartial decision rule then this equivalent rule d*(¢(v)) 
will also be impartial. We show, in proving Theorem 2, that this is indeed the 
case. The question raised on page 374 of [1] with reference to Example 2 of that 
paper is thus answered in the affirmative. Our final result (Theorem 3) gives a 
characterization of impartial decision rules which relates the present definition 
to the one adopted in [1]. It might be pointed out that impartiality is a special 
case of invariance (cf. [6]), so that this is a special case of the following proposi- 
tion: the conditional expectation of an invariant decision rule is also an invariant 
decision rule. A discussion of the general proposition will appear elsewhere. 

Now, it is known (Theorem 1 of [1]) that there exist two impartial decision 
rules, called df (t(v)) and df (t(v)), such that in many applications df (é(v)) is the 
worst one and d; (t(v)) the best one in the elass of all impartial decision rules of 
the form d*(t(v)) whatever the unknown parameter point w may be; that is, 
di (t(v)) and dz (é(v)) are the uniformly worst and uniformly best decision rules 
in the class of all impartial decision rules which are based on the sufficient 
statistics s;, S82, --- , s alone. Theorem 2 shows that in these applications 
d? (t(v)) and df (t(v)) are in fact uniformly worst and uniformly best in the class 
of all impartial decision rules. (Theorem 1 of [1] is stated and proved only in 
the ‘‘continuous” case, but can be extended to cover the discrete case as well; 
the necessary modifications become evident upon comparing Theorem 3 of 
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the present paper with the development in [1].) By way of illustration of the 
argument indicated here, in the final section of the paper we consider certain 
problems connected with the case when 7m , m2, «++ , 7, are normal populations 
having unknown means m; and a common variance o° (which may or may not 
be known) and prove a result (Theorem 4) which generalizes Example 1 of 
[1] as also a result due to Simon [2] for the case k = 2. 


2. Theorems. The reader is referred to [3] for an account of such measure- 

theoretic terms and results as we use without explanation in what follows. 

Throughout this section we write ‘(sub)set’ for ‘Borel-measurable (sub)set’ 

and ‘function’ for ‘Borel-measurable function.’ Functions whose range is not 

specified are understood to be real-valued. R* denotes a fixed subset of the set 

. of all points z = (a, , 22, *** , %g) with real coordinates z; . (In our discussion, 

some of the spaces R? will be given at the outset, and all the others will be de- 

fined explicitly in terms of them.) For any subset A of R*, x4(z) denotes the 

characteristic function of A; that is, x4(z) = 1 forzin A and = Ofor z in R* — A. 

Let f be a nonnegative function on R* and let \ be a measure on R* (more pre- 
| cisely, a measure on the subsets of R*) such that 


[ fo dd = 1. 


Let Z be a random variable taking values in R* such that the probability of 
event {Z ¢ A} is 





; [ f(z) ad 
; 
for all sets A. We then say that Z is distributed (on R*) according to f(z) dv. 
Let U;, U2,---, and U;, be independent random variables whose joint 


distribution is governed by a parameter w taking values in a space 2. Each U; 
takes values in a set R” of points u. Let s be a function on R” onto a set R”™ 
of points y. Let h(u) be a nonnegative function of u, and let u be a o-finite measure 
on R”. Corresponding to each w in 2 and each i = 1, 2, --- , k let gi(y:w) bea 
nonnegative function of y such that 


[ _h(udgs(s(u): «) du = 1. 
It isassumed that U;, is distributed according to h(u)g,(s(u):w) du (¢ = 1,2, «++ k). 
Let R™ (= R” X R" X --- X R") be the set of all points v = (u, , u2, -** jus) 
with u,; in R" (i = 1, 2, --- , k), and write 
_k 
a(v) = [T r(ud), 
i=! 
t(v) = (s(u) , s(u2), --- , 8(ux)), 
k 
B(t(v): w) = I] gi(s(u;): w). 


Tn 
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Let x be the product measure» X u X --- X won R™, Then V = (U;, Us, 
+» , Uy) is distributed according to a(v) B(t(v):w) du”. If @ is a function on 
kR™, we shal denote the expected value (if it exists) of ¢(V), that is, the integral 


[.. o(v)a(v)8(t(v) :w) du, 


by E[6) | ol. 

Let R™ (= R™ X R™ X +++ & R") be the set of all values of ¢ as v ranze: 
over R™, and let the generic point of R™ be denoted by w or by (y; , Ye, *** 5 Ye)- 
It can be shown thatthe preceding assumptions and definitions imply that ¢ 
is a sufficient statistic for # when the sample space is R™; that is, corresponding 
to each subset A of R™ there exists a function ¢4 ,0 S ¢4 S 1, on R™ such that 


Elxa(t(v)) xa(v) | o] = Elxa(t(v)) da(t2)) | o] 


for all subsets B of R™ and all w in &. (¢4(w) is called the conditional probability 
of the event {V ¢ A} given ((V) = w and any w in Q). This property of ¢ does 
not, however, suffice for our purpose; we require in addition the following re- 
sult concerning the structure of the functions ¢,(w). 

Lemma. Corresponding to each y in R™ there exists a probability measure d, on 
R” such that for each A and w we may take ¢4(w) = vwo(A), where, for fixed w = 
(yr, Y2y *** y Ye)» Vw 28 the product measure dy, X Ay, KX +++ XK Ay, on R™, 

A proof of the lemma can be constructed along the following lines. (i) There 
exist functions g;(s:w) and a fixed probability measure u* on R* such that U;, is 
distributed according to g;(s(u):w) du* (¢ = 1, 2, --- ,k; we). (ii) There exist 
functions A,(C) such that, for each y, A, is a probability measure on R”, and 
for each subset C of R”, \,(C) is the conditional measure of C given s(u) = y 
when u* is the unconditional measure on R” (cf. Exercise (5) on page 210 of [3]). 
(iii) For each i = 1, 2, --- , k and any set C, A, (C) is the conditional probability 
of the event {U; e C} given s(U,) = y and any w in Q. Finally, (iv) ‘the con- 
ditional joint distribution of U; , U2, --- , and U; given s(U,) = y, , 7 = 1, 2, 
--+ , k, and w is the product of the individual conditional distributions under 
the corresponding individual conditions’’, and the lemma follows. We omit the 
detailed verifications. 

The reader should satisfy himself that (with a suitable definition of the suffi- 
cient statistic ¢ in each case) the lemma applies to all standard cases of k inde- 
pendent samples of equal size. It is therefore likely to prove useful also in con- 
texts other than the present one. 

Now let d(v) = (pi(v), po(v), --- , pe(v)) be a decision rule. Write 


pi(w) = [ pilv) dry . 
Rm 


Since », is a probability measure, it is clear from (2) that 0 < p?(w) < 1, : 


p*(w) = 1, so that d*(t(v)) = (pi (t(v)), pr(t(v)), --- , pe(t(v)) is a decision 
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rule. It follows from Exercise (6) on page 211 of [3] that p?(w) is the conditional 
expectation of p;,(V) given t(V) = w and any w in Q; that is, 
Elxa(t(v))p(v) | o] = Elxa(t(v)) pi (tr) | @] 

for ali subsets B of R™ and all w in @ (i = 1, 2, --- , k). Taking B = R™ we 
see that d*(t(v)) is equivalent to d(v). Thus we have 

THEOREM 1. Corresponding to any decision rule d(v) there exists an equivalent 
decision rule d*(t(v)). 

Suppose now that d(v) is an impartial decision rule. It is easy to see that in 


that case d*(t(v)) must also be impartial. Consider the case k = 2. Then for 
any point (y; , y2) of R”™ we have 


Pi, ud = [ p20) doa, vy(0) 


- [. (f. Po(uy , Ua) ar, (ud} dd, , (U2) 
= [. if. i (u2 , U4) ar,(u)} dd, (v2) 


st [ x [ | alts, U2) ary (u)} Ady, (u) 


= |, P20) drag, vl) = vous, 1) 


and the impartiality of d*(¢(v)) is proved. A parallel argument applies to the 
general case. Hence 


THEOREM 2. Corresponding to any impartial decision rule d(v) there exists an 
equivalent impartial decision rule d*(t(v)). 

We remind the reader that the d*(t(v)) which we have constructed in terms 
of the given d(v) is equivalent to d(v) in virtue of the fact that d*(t(v)) is the 
conditional expectation of d(V) given t(V) = ¢(v) and any w in Q. In many sta- 
tistical applications, the loss incurred in adopting a particular decision d’ = 
(pi, p2, °** , Ps) When w is the parameter point is of the form 


Uw, d°) = L file)pi, filw) = 0. 


For each w let c(w, x) be a bounded convex function of x defined for min, {f,(w)} < 
x S max; {f,(w)} and write ¥(w, d’) = c(w, Uw, d°)). Then y is a convex function 
of d° for each fixed w. It follows from Lemma 3.1 of [4] that, irrespective of the 
particular weights f;(w) and particular function c(w, x), we have 


Ely(w, d*(t(v))) | wo] S Ely(w, d(v)) | o] for all w. 


Our immediate purpose in stating this consequence of the relation between 
d(v) and d*(t(v)) will be served by noting the following easy corollaries of the 
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general result: (i) The expected loss when using d*(¢(v)) always equals the ex- 
pected loss when using d(v), and (ii) the variance of the loss when using d*(é(v)) 
never exceeds the variance when using d(v). Now, the equivalence of d*(¢(v)) and 
d(v) also implies (i), but results such as (ii) do not follow from equivalence 
alone. There is, therefore, a somewhat stronger justification than the one given 
by Theorems 1 and 2 for using decision rules which depend on the outcome v 
only through ¢. 

We shall now give a useful representation of impartial decision rules. Let 
¢(u) be a real valued function on R” and for any v = (uw, U2, «++ , Ue) set 
o: = o(u;),i = 1, 2, --- , k. Let D(@) be the class of all impartial decision rules 
which are based on ¢; , ¢2 , --* , ¢ alone, that is, all impartial decision rules of 
the form d(v) = (pilgi, G2, «++ , be), Deli, --- , Oe), ++ » Der, -** » de). 
Since ¢(u) is a given function, D(@) will, in general, be a subclass of the class of 
all impartial decision rules, but may coincide with it. In any case, for given », 
let da) , Oa) , *** » O@) be the k (not necessarily distinct) numbers ¢; arranged 
in ascending order of magnitude and write 


( if o; = hh» 


0 otherwise (t,j = 1,2, +--+, k). 


a5 = 


THEOREM 3. A decision rule d(v) = (pi(v), pe(v), --- , pe(v)) ts a member of 
D(o) if and only if there exist functions \;(z, 22, --* , %),j = 1, 2,°°:, k, 
such that 


and such that for each i = 1, 2,---,k 
k 


pilv) = a ov .. As(Gay , Gy, *** » Oa) 
oe ™ 
i=l 
for all v. 
The proof is by direct verification and is omitted. 


3. An application. Let +;, m2, --- , 7 be normal populations, 7; having an 
unknown mean m; and variance o°. Write 0; = (m;, ¢),w = (0;, 62, °-- , %), 
and let @ be the set of all points w which are regarded as being possible in 
the given case. Let gi(w), 7 = 1, 2, --- , k, be functions defined on Q such that 
m; S m; implies g; S g;, i,7 = 1, 2, --- , &. Suppose that samples u; = 
(tia, 22, -** , Lin) Of n independent observations are drawn from each of the 
populations +, , giving the combined sample point v = (um, U2, -**, Us). 
For any decision rule d(v) = (p,(v), pe(v), --- , pe(v)) and any w in Q let the 
expected loss, or risk, by given by 


(6) r(d | w) = max {gi(w)} — 2 gi(a)- Elp.(v) | ow). 


) 
| 
| 
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Regarded as a function of w, r(d | w) is called the risk function of d(v). The problem 
is to construct (if possible) an impartial decision rule d*(v) such that r(d* | w) 
is as small as possible no matter what w may be. We shall show that the problem 
has a solution which is independent of the functions g; . We shall also describe 
two determinations of the functions g; which seem to be of special interest. 


For any v, set Z; = n* Diskus tij,4 = 1,2, --- ,k, and Za = min {2,}, Za = 
max {Z;}, and let a(v) = number of Z,’s which equal Za) , b(v) = number of 
%,;’s which equal Z«) . (Of course, we have Pr(a(v) = b(v) = 1 | w) = 1 for all 
w.) Write 

1 if S 
pi) = jae) © 
\ 0 otherwise (Gj = 1,2,--- , k), 
| I if 7 3 
pio) = yb) “*~ * 
| 0 otherwise (j = 1,2, «++ ,k). 
It is then clear that d,(v) = (pi (v), p2’ (v), Rake ps (v)) and d,(v) = (pt (wv), 
ps’(v), --* , pe’’(v)) are fixed impartial decision rules which depend on v only 
through Z, , f,-:-:, %&. 


Turorem 4. Let D be the class of all impartial decision rules d(v). Then 


r(d; | w) = sup r(d | a), r(dy | w) = inf r(d | w) 
deD deD 
for all w in Q. 

Proor. Choose and fix an arbitrary impartial decision rule d(v). Let c > 0 
be any constant such that the subset 2. = {wiwe2,a¢ = c} is non-empty. Now, 
corresponding to each w in Q, the probability density (with respect to n-dimen- 
sional Lebesgue measure) of the sample from 7; is of the form h,(u)g:(¥:w), 
i= 1,2,---,k. Write tv) = (4, t,.--- , &). It follows from Theorem 2 
that there exists an impartial decision rule based on ¢(v) alone, say d?(#, , 2 , 

- , %), which is equivalent to d(v) provided w is restricted to Q, . From equiva- 
lence and (6), we have 


(7) r(d | w) = r(dz | w) 


for w in 2, . Now, since for 7 ¥ j we have Pr(Z, = Z; | w) = 0 for all w, it follows 
that (with probability equal to one for all w) the representation of impartial 
decision rules based on Z, , % , --- , % which is given by Theorem 3 coincides 
with the representation assumed in Theorem 1 of [1]. An application of this 
last theorem (cf. Example 1 in Section 6 of [1]) shows that 


(8) r(d;|w) =r(d&|w), r(de|w) S r(dz | w) 


for all w. It follows from (7) and (8) that r(d; |) 2 r(d| w) and rid | w) S 


r(d | w) for w in Q, . Since both d(v) and ¢c are arbitrary, Theorem 4 is proved. 
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In conclusion, we describe two applications of Theorem 4. Suppose that v 
is the outcome of preliminary experiments on 7 , m2, --+ , 7 and now it is de- 
sired to draw a total of NV observations from the k populations in such a way that 
the mathematical expectation of the sum of the values obtained is as large as 
possible. Let d(v) = (pi(v), po(v), «++ , pe(v)) be a suitable decision rule and sup- 
pose that Np,(v) observations are drawn from 7; ,7 = 1, 2, --- , k. Then the 
mathematical expectation of the sum of the values obtained is N >>' m;E[p,(v) 

w]. Since the maximum of this quantity is N max {m,, m,--- , m}, the 
expected Joss in using d(v) may be taken to be 


(9) | max {m;} — or mE|p.(v) | ai]. 


The expected loss is of the form (6), with gi(w) = Nm; fort = 1, 2,---,k. It 
follows that in the class of all impartial decision rules the uniformly best rule is 
to drawn an equal number of observations from populations 2; such that %; = 
max {Z, , #2, --- , %} and none from the others. 

Suppose now that it is desired to select one of the populations 7; , the object 
being to select a population, z, say, such that m, = max {m,, me, --: , m}. 
As pointed out in the introductory section, the statistician may then employ a 
suitable «cision rule, say d(v) = (pi(v), --- , pe(v)), in the following way: 
given v, he performs a random experiment whose outcome p takes on the values 
1,2, --- , k with Pr(p = t) = p,(v) (¢ = 1, 2, --- , k), and selects w, . Write 
Ma) = max {m,, mM, «++ , me}, and set 


[! if mi = Me); 
gi(w) ce 
(0 otherwise (i = 1,2, +++ ,k). 


Then it is readily seen from (5) and (6) that with the present convention for the 
manner in which a decision rule d(v) is to be used, we have 


(10) r(d | w) = Pr (incorrect selection using d(v) | w). 


It follows from Theorem 4 that in the class of all impartial decision rules, the 
rule d,(v) which is to assign equal probabilities of selection to populations 7; 
such that 7; = max {Z, , Z.,--- , %} and zero probabilities to the rest, mini- 
mizes the probability of an incorrect selection uniformly for all w in Q. 

The reader is referred to [5] for an investigation from a more general viewpoint 
of the problem of minimizing (9) or (10) in the case k = 2. The discussion in 
[5] does not presuppose samples of equal size, and the class of all decision rules 
is taken into consideration. 
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SOME DISTRIBUTION-FREE TESTS FOR THE DIFFERENCE BETWEEN 
TWO EMPIRICAL CUMULATIVE DISTRIBUTION FUNCTIONS 


By E. F. Drion 
Statistics Department, T.N.O., The Hague 


1. Summary and introduction. It sometimes happens that of two empirical 
cumulative distribution curves (step curves) one lies entirely above the other, 
in other words that, except at both ends, they have no point in common. The 
problem then arises, what is the probability that this will happen when both 
are random samples from the same population. In this paper a partial answer 
will be given, based on the ingenious solution of André (as cited in the well known 
textbook of Bertrand [1] in the problem of the ballot and also in Chap. VIII, 
Sect. 5 of [7]). Moreover an analogous method will allow us to give an exact 
answer to the problem of the maximum difference between two empirical cumu- 
lative distribution functions of random samples from the same population, but 
only if both samples have the same size. Smirnov has given an asymptotic solu- 
tion for the latter problem (cited by Feller [3], see also [2]). 

Our result leads, by using the Stirling approximation for the factorials, to the 
asymptotic formula of Smirnov. 

A comparison of numerical results of the exact formula and the asymptotic 
formula of Smirnov shows that at least in the case of equal samples, the prob- 
abilities calculated by the Smirnov formula have, for samples as small as 20, 
an error of less than 4% for probabilities 0.033 or more. (See also Massey [5], 
who has calculated the exact probabilities for equal samples by means of differ- 
ence equations. ) 


2. Statement of the problem. Let a population P be given with an unknown 
continuous distribution function F(x). From this population two random samples 
I} °+* fq, and y; °° Yn are drawn. After ordering each sample from the smallest 
value to the greatest we shall call them x --- z,, and y; --- yn,. For each sample 
the empirical distribution-function (step-function) F,(x) or F2(y) is constructed: 


F(z) = 0, <n, F.(y) = y<my, 


F(z) = —, 2. Bf \ Betas F.(y) _ @ j;Sy < Yu, 
1 


F,(z) = 1, Za, & % F.(y) = Yng & Y- 


As we have assumed that the population has a continuous distribution-function, 
Pr(x; = y;) = 0 for all sets of values of 7 and j; that is, the discontinuities of the 
two step-functions have, except for a probability 'zero, unequal abscissae. 
Under these assumptions we ask ior: 
A. The probubility that either F,(x) — F.(x) < 0 or F(z) — F(x) > 0 for 
all values of x between min (z;, y;) and max (2,, , Yn.) (boundaries not included). 
563 
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B. The probability that max | F\(x) — F(x) | 2 d. 
We shall give a general solution of problem A both for the case that ny = nz 


and that the greatest common divisor of n; and ne equals one. For problem B a 
solution has only been found for the case that n; = ne. 


3. Graphical representation of two ordered samples. If we order the observa- 
tions of both samples in one series according to their magnitude, so that we shall 
have a series of mn; + m2 terms of the form 21, 22, y1, %3, Y2,°**» Yno SAY, 
then our problem A is equivalent to the following: What is the probability that, 
in a random series of n; 2’s and nz y’s, the proportion of x’s to y’s from the first 
to the n-th term of the series (where n may have all values from 2 to n; + n2 — 
1 included) is, for each n, always smaller than n;/nz or always larger than n;/ne . 

That both problems are equivalent may be shown in this way. If the two series 
of observations are random samples from the same population, they may be 
considered as one sample of size nm; + mz, in which n, observations are marked 
x and ne are marked y. The marking of the observations does not depend (in 
random samples) on the result of the observations, so all orders of the x’s and y’s 
are equally probable. 

To solve this problem we shall make use of a graphical representation of these 
series. Let the x’s represent horizontal paces and the y’s vertical paces, then all 
possible series will be represented by all possible routes joining the diagonal 
corners of a rectangular lattice of sides n; and n.. Those routes which have no 
common point (except the end-points), with the diagonal of our rectangle, 
represent series where the proportion of z’s to y’s is either always larger than 
n,/ne or always smaller. 

As an illustration we shall give the step-curves and the routes in the lattice for 
two series, in one of which the step-curves do not have a point in common, (and 
where, therefore, the route in the lattice lies entirely at one side of the diagonal) 
while in the other the step-curves intersect’. The sequence of ordered samples in 
Fig. 1 is (roman type denoting z’s and italic denoting y’s) 2.0, 2.3, 2.4, 2.6, 2.7, 
2.9, 3.0, 3.1, 3.3, 3.4, 3.6, 3.8, 4.1. The sequence in Fig. 2 is 2.0, 2.3, 2.5, 2.6, 2.8, 
2.9, 3.1, 3.2, 3.4, 3.5, 3.6, 4.3, 4.5. 


The number of all possible routes from O to P is * ” - = T. We shall 
1 


now calculate the number A of all routes A” from O to P lying below the diagonal 
OP. The fraction A/T gives then the probability that of two empirical cumulative 
distribution curves of samples from one population the second lies entirely above 
the first. As each of the samples may be chosen as the first, the probability of no 
intersections of the step curves will be 2A/T. 

! It will be clear that if the paces in both directions have unit length, the route divides 
the rectangle in two parts of which the area’s are respectively U and nnz — U, where U is 
the statistic defined by Mann and Whitney for the test of Wilcoxon [4]. 


2 We use A as well to indicate a route lying entirely to the right of the diagonal as to indi- 
cate the number of these routes. 








SOME DISTRIBUTION-FREE TESTS 565 


The number A of routes lying below the diagonal OP depends on the number of 
lattice-points on OP that is to say, on the greatest common divisor of n; and nz . 
If nm. = nm, = n all routes reaching the diagonal will reach it in a lattice-point, 
as no route can intersect the diagonal except in a lattice-point. If n,; and ne are 
coprime there are no lattice-points on the diagonal (except the endpoints O 
and P), while if n; and nz (nm, # nz) have a greatest common divisor d > 1, there 





Fig. 2 


are d — 1 lattice-points on the diagonal between O and P; so on n, — d points a 
vertical route section and on n, — d points a horizontal route section can intersect 
the diagonal outside a lattice point. 

So the lattice-points available for a route under the diagonal OP is relatively 
to the total number lattice-points highest if n, and m2 are coprime and lowest if 
nm, = n2. It stands to reason that the number of routes A is in the first case 
higher than in the second case. This we shall prove. For the intermediary case 
(greatest common divisor d of nj and nz, > 1) we shall prove that the number of 
routes A relative to the total number of routes T is always less than when n; and 
n, are coprime. Probably this number is always higher than when m, = nz. But 
we were not able to prove it. 
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4. Determination of the number of routes A in the case n, = n; = n. In this 
case (Fig. 3) the lattice is a square with (n + 1)’ points. We shall not determine 
the number of routes A directly, but first we shall determine the number of 
routes that start with a horizontal step OR (and so could belong to the class A) 
having at least one point in common with the diagonal. It will be proved that 
this number equals twice the number of routes starting with a horizontal step 


and ending with a horizontal step. The proof given is essentially the proof found 
by André. 





Fia. 3 


The last step of a route ‘“not-A”’, which starts with OR, can either be S’P 
or SP. Routes ending with S’P must cross the diagonal OP and are therefore 


‘ ; .f/mam-2 
routes “not-A”’; their number is . : 


To prove that the number of routes “‘not-A”’ ending in SP equals the number 
of routes ending in S’P we shall show that there exists a one-one correspondence 
between the routes “not-A”’ ending in SP and the routes ending in S’P. A route 
‘“not-A” like ORQSP can be transformed in a route ending in S’P by rotating 
the part QSP about OP to QS’P. Here the point Q is the last point on the route 
before P that lies on the diagonal OP; each route ‘“‘not-A” ending in SP can there- 
fore be transformed in one way only in a route ending in S’P. On the other hand 
each route beginning with OR and ending with S’P will cross at least once the 
diagonal OP. By rotating about the diagonal OP that part of the route, which 
lies between P and the point Q where it reaches for the first time OP, it will be 
transformed in a route ‘‘not-A”’ ending in SP. This route “‘not-A’’ ending in SP 
is also uniquely determined by the route ending in S’P. So we have proved the 
one-one correspondence between the routes “‘not-A”’ ending in SP and the routes 
“not-A” ending in S’P. The total number of routes “not-A”’ starting with OR 
F 2n — 2 
is therefore 2 (° " 


n 
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De aw 
The total number of routes starting with OR is e . . therefore the num- 


of routes A is 
y ') - 2(? ,?)\- e= 1 - 2)(™"~?) 1 Pond 
n " n —_ n ~n—l n ' 
The total number T of routes from O to P is ( -~ So the probability that 


a route chosen at random lies either entirely to the right or entirely to the left 
of the diagonal equals 


’ 1 2n — 2 2 2n — 2 
2x 1 ( n pe on ae ) = 1 


c te t= 13) 1 
n mn an—l 


The probability that the cumulative frequency curves from two random 
samples n of the same population have no points in common (except the end- 
points) is therefore (1/2n — 1). 


5. Determination of the number of routes A in the case n, and n, coprime. 
In this case (Fig. 4) there are no lattice-points on the diagonal except the end- 
points, and if through any lattice-point (except the endpoints) a line parallel 
to the diagonal is drawn no other lattice-point will lie on this line; for if there 
were two lattice-points x,y; and 22%/2 on this line, then the triangle with angles 
(xy), (tey2) and (a2y,) would be similar to the triangle (0,0); (nm; , nz) and (mn; , 0); 
so (y2 — y:)/(x2 — 21) = ne/nm, where (y2 — y;) and (xr, — 2) are integers 
smaller than nz respectively n, . But this is impossible, as n, and nz are coprime. 

A route A like OQP passes through n; + nz — 1 lattice-points (O and P ex- 
cluded). If this route is cut in any of those lattice-points (like Q) and the two 
parts are interchanged the new route will not be a route A, that is to say it will 
not lie entirely to the right of the diagonal OP. For the angle PQC’ is greater than 
the angle POC, so that if Q is placed in O then P will lie in a point Q’ to the left 
of OP. Furthermore a straight line through Q’ parallel to OP will not intersect 
anywhere the polygon OQ’P; the part OQ’ is not intersected because OP does not 
intersect the part QP cf the original line and Q’P is not intersected because OP 
does not intersect OQ (for OQP is a route A that is, by definition a route, not 
intersected by OP). If we cut the route OQ’P in Q’, (which point is uniquely 
determined as being the first point lying on a line parallel to OP moved from D 
to P) and interchange the two parts OQ’ and Q’P, the original route OQP will be 
reconstructed. On each route OQ’P which passes through at least one lattice- 
point Q’ at the left-hand side of OP and only on these routes, one, and only 
one, point Q’ can be found, therefore a route OQ’P (not-A) gives after trans- 
formation only one route OQP(A). On the other hand, two different cuts of a 
route A will give after transformation two different routes, because if the co- 
ordinates of the section-points be (x; , y:) respectively (x2, #2), the coordinates 
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of the images Q’ respectively Q: of O and P will be (mn; — 21,2 — ys) respectively 
(nm, — 22, M2 — Y2), Which points are different. As each route ‘“not-A” has only 
one point Q’, two routes with different points Q’ are different. It is also impossible 
that two different routes “‘A”’ give after section the same route ‘‘not-A”’, because 
the transformation of a route “not-A”’ to a route “‘A”’ is unique. As all routes lie 
either entirely to the right of OP (are routes ‘‘A’’) or have at least one point to 
the right of OP, and as each route A gives by the (nm; + m2 — 1) possible cuts 
(n; + n2 — 1) different routes ‘‘not-A’’, the total number 7’ of routes from O to 
P equals A + (nm + m2 — 1)A = (ni + me)A. Therefore the probability that a 
randomly chosen route is a route A equals 1/(n; + nz). The probability that two 
empirical cumulative distribution-curves, from two samples of size n,; and nz 


(nm; and nz coprime) from the same population do not intersect, is therefore 
2/(m, + Ne). 





Fig. 4 


6. Determination of the number of routes A in the case n, and n, have a 
common divisor greater than 1. If mn; and nz (m; ¥ nz) have a greatest common 
divisor d > 1, there are d — 1 lattice-points on the diagonal (except the end- 
points). In this case the routes ‘‘not-A’”’ can be divided into two groups: ‘“‘not-A,’’, 
routes which pass through at least one lattice-point at the lefthandside of the 
diagonal, and ‘“‘not-A,”’, routes which pass through one or more lattice-points 
on the diagonal but do not pass through a lattice-point at the lefthandside of 
the diagonal. A cut followed by an interchange of the two halves of a route 
“4” will transforma it into a route “not-A,”. A cut followed by an interchange of 
the two halves of a route ‘‘not-A,”’ will transform it either into a route ‘“‘not-A,’, 
or into another or the same route ‘“‘not-A,” (if the cut falls on the diagonal). 
So the total number of routes is (mn; + nz. — 1) A + A + routes “not-A,” + routes 
“not-A,’, resulting from cuts in routes “not-A2”, = (m1 + nm) A + x. The num- 
ber of routes A is therefore less than the 1/(m; + ne)th part of all the routes. 

It may seem rather strange that the probability in the case n; , n2 coprime is 
about twice the probability in the case n} = nz. This discontinuity is of course 
caused by the fact that in the case n; = n, both distribution-curves may have one 
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or more points in common (except the endpoints) without crossing each other 
(in other words the graph may meet the diagonal without crossing it). In the 
case that n; and ne are coprime this is impossible. 

If in the case nj = ne we seek the probability that either F,(x) — F.(x) s 


0 or F(x) — F2(x) 2 O (instead of F(x) — F(x) < 0 or Fy(x) — F(x) > O) it 
will be found (by applying formula (1) of the next section with h = 1) that this 
probability is 2/(n + 1). Therefore under these conditions the probability for 
nN, = n is about twice as large as for n; coprime to nz. 

In consequence no direct statistical test can be based on these results, as one 
of the referees remarked. However should one in an investigation find that of 
two empirical distribution-curves, one lies entirely above the other, the formulas 
given above enable one to calculate the probability that such a result is caused 
by random sampling fluctuations. 


Fic. 5 


7. Probability that the maximum difference of two empirical distribution 
curves from two samples of size n from one population is at least h/n. To solve 
this problem (exact solution of the problem of Smirnov in the case of equal 
samples) we shall again use the representation of our two samples by the lattice 
OCPD (Fig. 5). As nm, = nz = n, this lattice is a square. All routes from O to P 
that reach a point on one of the lines EF, GH or on both lines, or that intersect 
one or both of these lines represent pairs of samples, where for some value z 
the maximum difference | F;(x) -- F2(x) | is at least OF/PC = OF /n. 

To solve this problem we need the following lemma. 

Lemma. The number of routes in a rectangular lattice with sides ny and ne, 
such that somewhere the number of vertical paces y exceeds the number of horizontal 
paces x by at least h is oe “e (An algebraic solution of this problem is given 

1 
in Whitworth Proposition X XIX [8]. We shall give here a geometrical solution 
that can be extended to the problem of Smirnov.) 

Proor. All routes, such that somewhere the number of vertical paces exceeds 
the number of horizontal paces by at least A, are routes that reach or intersect a 
line EF, which makes an angle of 45° with DO (fig. 6). 


HL LL AIL LE SAAP OMIA AALS OM 0 








570 E. F. DRION 


We shall cut a route, such as OGP’ that somewhere reaches the line EF, in 
the point G where it reaches this line for the first time. The part OG is reflected 
about EF to O’G; the part GP’ is left in its place. 

A route from O to P’, reaching or intersecting EF, may thus be transformed in 
one way in a route from O’ to P’. As we may transform the route O’P’ back to 
the route OGP’ by cutting it in the first place where it reaches EF, and reflecting 
O’G about EF, we see that there is a one to one correspondence between the 
routes from O to P’ reaching or intersecting EF and the routes (O’GP’) from O’ 
to P’. If the sides of the lattice measure n; (= OC’) and n, (=C’P’) and if OE 





Fia. 6 


= 0’E measures h, then the number of routes from O’ to P and therefore the 
number of routes from O to P reaching or intersecting the line EF is 


(1 mthtnm— ") = e + " 
) ( mth ~~ Amth 
7.1. Classification of routes OP representing empirical distributions where max 

F\(x) — F(x) | 2h/n. The solution of the problem of Smirnov is, even in the 
case of equal samples, rather complicated, while the empirical distribution 
curves F(x) and F.(x) may intersect more than once. Therefore it is possible 
that there are one or more values of x such that F;(x) — F.2(x) 2 h/n, and in the 
same pair of samples other values of x, such that F2(x) — Fi(x) S h/n. In other 
words, the route may intersect or touch both lines a = EF and b = GH. We may 
classify the routes which touch or cross either one or both of the lines a and b 
in the following way (Fig. 5): 

A. routes touching or crossing only a, 

B. routes touching or crossing only b, 

C. routes touching or crossing first one or more times a and afterwards touch- 
ing or crossing b (after having touched or crossed b, these routes may also touch 
or cross a again). 

D. routes touching or crossing first one or more times b and afterwards touch- 
ing or crossing a (after having touched or crossed a, these routes may also touch 
or cross b again). The letters A, B, C and D will also be used for the number of 
routes of these categories. In the same way we will use the letters a and b also 
for the number of all routes that cross a respectively b, whether they cross b 
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(respectively a) or not. By reasons of symmetry it is clear that a = b, A = B 
and C = D. Furthermorea = A+C+D,b = B+ C + D. The total num- 
ber of routes touching or crossing either a or b or both is A + B+C+D=a 
+b— (C+D) = 2(a — D). 


€ 


The number of routes a is given by our lemma viz. & ™ i) , 8o we have only 
to find the number of routes D. 


7.2. Calculation of the number of routes D. The number of routes D may be 
ound by a repeated application of the device used for calculating a (Fig. 7). 


We rotate the rectangle OFKC about HK, leaving that part of the route from 
O to P unchanged which begins at the point where this route touches or crosses 
for the first time HK. After the transformation the routes D are the routes from 
O’ to P which touch or cross F’G’, without having first crossed the line HK. The 
lengths of the sides of the rectangle O’P (indicating this rectangle by the ends 
of one of the diagonals) are n — h and n + h. To determine the total number 
of routes touching or crossing F’G’, we rotate the rectangle O’G’ about F’G’, 
leaving unchanged that part of the route from O’ to P which begins at the point 
where this route touches or crosses for the first time F’G’. The new rectangle will 
have sides n — 2h and n + 2h. The total number of routes in the rectangle 


9 
O”P is (, elk this i; therefore the total number of the routes in the rec- 
tangle O’P which touch or cross the line F’G’. To get the number of routes D 
on ) the number of routes in O’’P 


n — 2h 
touching or crossing the image H’K’ of HK in O’’P, without having touched or 
crossed F’G’. By rotating the rectangle O’’K’ about H’K’, we get a new rectangle 
with sides n — 3h and n + 3h. The total number of routes in this rectangle is 


- 2n 
(, i as ; the sought number of routes to subtract from (, panes 


are the routes which do not touch or cross the image of F’G’, which can be 


we must subtract from this number ( 


) among these 


NE eT Ore ee eee 
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determined by repeating the process of rotating. The law for the determination 


9) . 
of the number of routes will be clear. Their number is: sci - - - 
n — 2h n — 3h 


? 
(, a! — ---, the series being continued as long as n — kh = 0. 


The total number of routes which cross either one or both lines HK and FG is 


therefore 
7 2n 2n 2n 
2|(, ~ J - fi _ ia + s = ‘a of, 


2 
As all the routes from O to P number ey. the probability that a random 


chosen route touches or crosses HK or FIG or both is 


2n ) 2n 
9 << eee 
4 I(, —h (, - “) + | 
2n ; 
n 
This is therefore also the probability that the maximum difference of the cumu- 


lative frequency-curves of two random samples from the same population is at 
least h/n. 


TABLE I 
P 
n d h=nd 
Exact Smirnov 
20 er 5 5713 5596 
20 .40 8 -0811 .0815 
20 .45 9 .0335 .0349 
20 50 | 10 | 0123 .0135 
50 .16 8 .5487 .5441 
50 .24 12 .1124 1123 
50 .28 14 .0392 .0396 
50 .o2 16 0115 .0120 
100 12 12 -4695 -4676 
100 17 17 .1112 L112 
100 .19 19 .0539 0541 


100 .23 23 -0099 .0101 


8. Some numerical results. We have calculated the probability P that 
max | F;(x) — F.,(x) | 2 d for samples of size n = 20, 50, 100 and for values 
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of d such that P ~ 0.50, 0.10, 0.05 and 0.01 by means of the exact formula and 
by the asymptotic formula of Smirnov. The results are given in table I. 

The figures, given in the last column, were found by linear interpolation in the 
table cf Smirnov [6]. 

With equal-sized samples the asymptotic formula of Smirnov gives very satis- 
factory results even for samples of 20. We suspect, however, that when the 
samples are of unequal size the agreement will be less satisfactory especially if 
nm, and n are coprime, because in this case there is only one lattice point on HK 
and FG, which must in this case be parallel to the diagonal OP (c.f. Fig. 7). 


9. Concluding remarks. (a) The probabilities given above are based on the 
assumption that the distribution-functions of the population are continuous. In 
practice almost all distribution-functions, however, are discontinuous, owing to 
the limited accuracy of our measurements. In other words, in practice we work 
always with grouped data, although the classes may be so small, that in no class 
falls more than one observation and often none. Nevertheless, when the number 
of observations is large enough, more than one observation will be found in 
several classes. , 

Let the width of the classes be h, so that the values of F(x) (i.e. the cumulative 
experimental distribution-function) are only known for x = hg (with g an integer 
between ga = [2,/h] and g, = [z,/h] + 1, where [x/h] denotes the integer part of 
x/h). If of two ungrouped samples, 2, --- , Zn, and y, -** , Yn,, the cumulative 
experimental distribution curve of the y’s lies entirely to the righthand side of 
that of the z’s, i.e. if Fi(x) > F2(x), ay = min (1), y:) S FZ S max (Fn, , Yay) = 
a, then, after grouping, 


Fy(hg:) > Fo(hg), n= min ([#| + 1, [] + 1) S 9: 


o=(#)0¢)-« 


But the converse needs not hold. Therefore the probability that Fi(hg;) > 
F,(hg;) for all values of g; between g; and g, (g; and g, included) is greater than 
or equal to the probability that F; (x) > F: (x) for all values of x between a, and 
Q2 (a; and a» included). 

If however F,(hg;) > Fo(hgisi), 9: S gi < gn — 1, then Fi(x) > F,(z), aq S 
x < a, although the converse needs not hold. Therefore the probability that 
F,(hgi) > Fe(hgi+1) for all values of g; between g; and g, — 1isless than or equal 
to the probability that Fi(z) > F(x) for all values of x between a; and a, 
(a; and az included). 

From this last result the following conclusion may be drawn: the probability 
that in two grouped random samples from the same population the cumulative 
experimental frequencies of one of the samples is higher at all class boundaries 
(which are the only values of the variate for which the cumulative frequency is 
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known) than the cumulative frequencies of the other sample at the class bound- 
aries of the next higher class is less than or equal to the formulae given in Sec- 
tions 4 and 5. 

(b) The formula given in Section 7 for the Smirnov test applies to the two- 
sided test. In the case we are only interested in deviations in one direction the 
formula is much simpler. With ecual-sized samples from the same population 


9) 9 
the probability that Fi(z) — F(z) > d/n is(,, - i) / (°"). 


n 
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CONFIDENCE BOUNDS FOR A SET OF MEANS 


By D. A. 8. Fraser 
University of Toronto 


1. Summary. Professor John Tukey suggested the following two problems to 


the author: given that X, , X2, --- ,X, are normally and independently distrib- 
uted with unknown means 4; , ue, *-* , #n and given variance o°; 


ProBieM A: Find a 6-level confidence interval of the form 
g(Z1,°** 2m) 2, *** Mn = —%. 
ProBLeM B: Find a 8-level confidence interval of the form 
g(r , ran » &) a ee h(n, 00° 5 Sa) 


The main result of this paper is the nonexistence of intervals satisfying mild 
regularity conditions and having an exact confidence level (unless n = 1 or 8 = 
0, 1). However for each problem an interval is given for which the confidence 
level is greater than or equal to 8 (formulas (2.1), (4.1)); these intervals are 
apparently shorter than those previously used in practice. Also the procedure for 
obtaining any interval with at least 8 confidence is described. 

Some results are discussed for distributions other than the normal. 


2. Introduction to Problem A. 


2.1. Normal distributions. If X,,--- , X, are normally and independently 
distributed with known variance o’ and unknown means 4, -*: , #a, then Prob- 
lem A is to find an upper §-level confidence bound for the set {u,--- , un}; 
that is, to find a function g(z,---, 2.) such that Pr{g(X,,---, X,) 2 
max Mi} = 8 for all Miy*** » Mn- 

One approach to this problem is to look for exact 8-level confidence bounds: 
the above condition on the function g(x; , --- , Zn) is replaced by Pr{g(X, , 

-, X,) 2 max u;} = 8B for all uw, --- , uw, . This more restrictive condition 
in a confidence region problem is of course analogous to the requirement of 
similarity in the theory of hypothesis testing. 

In Section 3 Problem A is analyzed but attention is confined to measurable 
functions g(x: , --- , t,) which satisfy two mild restrictions. These restrictions 
are given by the following assumptions concerning the function g(x , --- , 2,). 


ASSUMPTION 2.1. For all 2,, +++ , 2n, g(t1 + 6, +++, 2n + 5) is a monotone 
nondecreasing function of 6. 


ASSUMPTION 2.2. Jf x; = max a; (i = 1, --- , n), then g(m, «++ , Zn) satisfies 
g(r , 4 a 4 g(r, oe » Bi, By FG, jar, Ct » 2a) 


for all 41, +--+, X, and for any positive 6. ; 
The second assumption seems reasonable since a bound would certainly be 
575 
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suspect if it were smaller for 27.2, 25.5, 26.3 


27.5. 


, 27.8 than for 27.2, 25.5, 26.3, 


It is then proved by Theorem 1 that there does not exist an exact A-level 
confidence bound which satisfies these two assumptions. 

As a by-product of Theorem 1 a bound having at least 8 confidence is ob- 
tained; it is 
(2.1) g(t, °° 5%) = max 2; + Ni-8¢0, 


where Nj_, is the 1 — 8 point of the unit normal; that is, 


a r —}2? dx = 
V/ 2m i é = a. 


The optimum properties of this bound will be discussed in a later paper. 

The above bound, however, is not the only confidence bound. In Section 3 a 
procedure is given for constructing bounds having at least 8 confidence. For this 
it is convenient to restrict attention to bounds satisfying a more restrictive ver- 
sion of Assumption 2.1. This assumption 2.1* is obtained by applying the princi- 
ple of cogredience to the problem using the transformations rz; = 2; + C,i = 
1, ---, 7, for all C. 


ASSUMPTION 2.1*. The function g(a, +++ , Xn) satisfies the equality 
g(a + 4, . » Xn + 4) ” g(r, ae » Zn) +6 
JOP Ge Bag ** 9p Bayt 


2.2. The general problem. The problem as described above is a particular case 
of the following: given X,, --- , X, are independently distributed with proba- 
bility density functions f(x — mw), --- , f(z — wn), find a B-level confidence 
bound for the set {u:,--- , un}. Theorem 2 shows that if f(x — wu) satisfies a 
condition of bounded completeness, then exact 8-level bounds do not exist. 
A bound having at least 8 confidence can of course always be obtained by adding 
to max 2; the 1 — 8 point of the distribution f(x) (that is, with » = 0). 


3. Analysis of Problem A. 


3.1. Characteristic function of a confidence bound. We define a characteristic 


function for the bound g(x, , --- , 2.) as follows: 
(3.1) oo(%1,°°* , 2a) = 1, Gay, ++ ;f) && 
; = 0, g(t, °-*ytn) < 0 
From assumption 2.1 we can infer that oe(x1 + 6, --- , 2, + 6) is a monotone 
nondecreasing function of 6. 
To derive conditions on g(x, , --- , t,) from Assumption 2.2 we first define 
disjoint sets S,;, --- , S, which cover R” except for a set of measure zero: 


(3.2) Si = {(t1,°+- , 2a) | a > max 2;}. 
jy 
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The second assumption insures that for points (x, --- , %n) € Si de(ti, +++, 
Vin, Ui, Lis1,°** , Ln) iS a Monotone nondecreasing function of z; . 


3.2. Theorem for normal variables. To prove Theorem 1 we shall need the fol- 
lowing 

Lemma 1. If ¥,,---, Yn are normally and independently distributed with 
means 4, *** , bn and untt variances, then the set of densities corresponding to all 
(ui, *** 5 wn) € [— ©, O]" ts boundedly complete; that is, 


E{¢(Yi,---,¥n)} =0, (ui, , un) €[—2,0]”, 
and 

|O(jis ++ > Yn) | <M 
imply 


oy, eo. » Yn) =0 
almost everywhere. 

Proor: The above set of distributions is complete; see Lehmann and Scheffé 
[1]. Since completeness implies bounded completeness, the lemma follows. 

TuHeoreM 1. Jf X,,---, Xn, are normal and independent with means 
i, *** , Mn and variance o’, there does not exist (unless 8 = 0, 1, orn = 1) a meas- 
urable function g(11, ++: ,%n), satisfying assumptions 2.1 and 2.2, which is an 
exact B confidence bound for max y; . 

Proor: Without loss of generality let o° = 1. We consider a measurable 
function g(x: , --- , 2») satisfying assumptions 2.1 and 2.2 and, assuming that 
g is an exact 8-level confidence bound, we shall find that a contradiction results. 

We have 


8 


ll 


Prig(Xi, --- ,X,) 2 max yj} 

E{oe(X1,--- ,X,) | max ws = 6} 

E{84°(X1, =e Xi-1, Xizi; aoe X,) | max wy < 6}, 
ipsi 


(3.3) 


ll 


where 


(dy, 1 e —$(z 
(3.4) Bs (fi, *** Vin, Zign, ++, tn) = a= | do(t1, ++, tne? sO" de;. 
V 25 « 
We now derive conditions on the function 8,” and for simplicity let @ = 0. 
From the expression above it is seen that 


E{Bo' (Xi, — » Xin, Xsu1, ~~ > ae 8 | Max yp; < 0} - 0; 
dy 


hence from Lemma 1, we conclude that 
(3.5) 85° (x1 , oo? , Meet a Pte sy *** » Bad = 8 


almost everywhere. 








™ LLAMA ALLAN 
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Using the above condition on 8”, we obtain conditions on the function 
do(21 Ae Zn). 


a 
B = Bo(ay, +++ Linn » ign °° * y Bn) almost everywhere 


(3.6) 1 ~ ’ 
5 / 25 [ oo(X1, sala » Tn) ei dx;. 


Consider fixed 2, ,--- , %-1, 2i41, *** , tn (not of course belonging to the 
exceptional set of measure zero for which the equality (3.5) might not hold). 
For 2; > max je 2; , bo(%1 , «** , Zn) isa monotone function; and since it is a char- 
acteristic function it will have the following form 


oo (411,°°* ,Xn) = O, max 2; < 4% < u(n, 





wr oo Din» Ligh y *°* yp Tn)y 
(3.7) jxi 
= 1, u(21,°** Linn Digny *** yp Tn) KT < &. 
u(t, *** , Ui, Zinn, *** , 2a) is taken to be the value of 2; at which 
go(t1, --* , tn) jumps from 0 to 1 or max;,; 2; , whichever is larger. Using the 
function u(z, , +++ , Zn), we obtain 
1 = Je? ar 
o? ae : oo(%1,°°* » Inde ‘ j 
(3.8) 





1 - —4z? 
aad V/2n é S az; . 
eT SF (24 608% SG — 1 FEF 108 * En) 


However, since 


1 maneg niet 
0soz[* go(%1, -*:, Ine dx, 


1 max zj . 
so |= et dx; 
V Qn «© 


= Pr(X; < max 2;), 
ipi 


then 

(3.9) Nz < u(x, 9 ** % > Bentig Besase °° * 5 Xn) = N g-P(maxz;) ’ 
isi 

where 


P(max 2;) = Pr{X; S max z;}. 
ixi igi 
The inequality on u,(a;, +--+ , Zn) implies that ¢(2,--- , Zn) is equal to 
zero for almost all points in S; having x; < Ng. This is true for all 7; hence 
oo(21,°°* , tn) = Oif max 2; < Ng. Consider now (x, +--+ , Vi-1, Tint, ***, 
Xn) having maxj«; «7; < Ng ; in expression (3.8), the first integral vanishes leaving 
1 [ 2 
—jz? 
e”* dx. 
V/ 25 u(2y ot Sem TeF 08 ** Zn) 





s= 
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Therefore 
u(%1,***, Zinn, Zinn, ++, Sn) = Neg, max x; < Ng. 
jpt 
From the above equality on u(x, --+ , U1, Zin1,.°** , Xn), We Obtain the 
following conditions on ¢o(%1 , «++ , Zn): 
oo(21, ne 2) = 0, if max 2; < Na, 
1 
= 1, if exactly one z is larger than Nz. 
But since ¢o(21 + 6, --- , 2, + 4) is monotone in 6, we have 
do(%1, °** , In) = O, if max 2, < Nz, 
(3.10) L 
= 1, if max z; > Ng. 
Therefere 
g(t%1,°**,%n) < 0, if max z; < Nz, 
2 0, if max z; > Ng. 
Similarly 
Hm, -*: 2064 if max 7; < Ng + 8, 
(3.11) ‘ 
= 8, if max z; > Ng + @. 


This completely determines g(x , --- , Zn); 
g(t, °** ,%n) = max az; — Ng 
(3.12) 

= max 7; + Nig. 


However, contrary to our original assumption, this function g(x , --- , Za) is 
not an exact 8-level confidence bound unless 8 = 0, 1, or n = 1. For consider 
Bo’ (x2, +++ , &n); (3.5) gives 


Bs” (ae , oe » Tn) = 8 


almost everywhere, while the functional form of g(x; , --- , 2,) above implies 


(b r 
Bo (t2,°*+,2n) = B, max 2; S Nz, 
ipl 
= 1, max z; > Ng. 
jl 


These are obviously in conflict unless n = 1 or 8 = 0, 1. This completes the 
proof of Theorem 1. 

3.3. Examples of normal confidence bounds. Although an exact 8-level confidence 
bound satisfying assumptions 2.1 and 2.2 does not exist, bounds with at least 


8 confidence do exist; an example of one was obtained in the course of the proof 
of Theorem 1, namely, 


g(ti,***,%a) = max 2; + Migc. 
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It is easily seen from the form of 85" (a2, --- , 2,) that this bound has at least 
8 confidence, 
Bo” (x2 9 °° % ye) & BP, max xz; S Nz, 
jyfi 
= ], max x; > Ng. 
ip 


The confidence level is 
E{ po" (X2, +++, X,) | max p; S 0} 2 8. 
il 

We define a bound g having confidence at least 8 to be uniformly better than 
a bound g’ having confidence at least 8, if g < g’ for all (a1, --- , an), andg < 
g’ on a set of positive measure. It is not difficult to see that bounds. uniformly 
better than the example above, do not exist. This obtains from the following 
simple property of the normal distribution. Let Y be normal with mean @ and 
variance 1; then for 6 positive, all the probability less than C can be made arbi- 
trarily small with respect to the probability in any small neighborhood of ¢ + 6 
by taking @ large enough. 

Since it may be desirable to obtain bounds other than the example given 
above, we outline the procedure. For spherically symmetric normal distributions 
in R” having variance o and mean (u;, -*: , un) With max yu; = 0, we look for 
a region whose size is greater than or equal to 6 and whose characteristic function 
(x, + 6, --- , 2 + 6) is monotone nondecreasing in 6; then a B level bound 
g(%1, °** , tn) satisfying Assumption 2.1* is the following: 


g(%1,°** tn) = 


where 6’ is the value of 5 at which ¢o(a1 — 5, --- , 2, — 6) jumps from 0 to 1. 

3.4. Bounds for nonnormal distributions. As remarked in Section 2, confidence 
bounds for max »; may be wanted for distributions other than the normal; 
#1, *** , xn Would of course be values of the location parameter corresponding 
to the random variables X, , --- , X, . Consider the density function f(z — 4); 
we shall say it is boundedly complete (one-sided) if 


[ g(x)f(z — uw) dx = 0 
for any dense set of u» < 0 and | g(x) | < M imply g(x) = 0 almost everywhere. 


From a theorem of Lehmann and Scheffé which was mentioned in [1], we can 
conclude that if f(z — u) is boundedly complete (one-sided) then 


[ ff g(a, +++, an) I f(ai — ws) I de; = 0 
for all uw, +++ , un < Oand | g(m,--- , tn) | < M imply g(m,,--- , t.) = 0 


almost everywhere. This conclusion takes the place of Lemma 1 for the following 
theorem: 
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THEOREM 2. Jf X,,---, X, are independent and have probability density 
functions f(x — wi), --* , f(x — pn), where f(x — pw) is bound:d complete (one- 
sided), then there does not exist (unless 8 = 0, 1, or n = 1) a measurable function 
g(t, *** , Xn), satisfying Assumptions 2.1 and 2.2, which is an exact 8 confidence 
bound for max y; . 

Proor: The proof is essentially that of Theorem 1. 


4. Introduction to Problem B. The second problem is to find a confidence 


interval for a set of means; if X,, --- , X, are normally and independently 
distributed with known variance o° and unknown means y;, --: , #. , Problem 
B is to find two functions g(a, --- , 22), h(a, +++ , 2a) such that 


Prig(Xi, at » Xn) z mee elie ACE, “9 ae = B. 


We also study the problem of finding an exact 8-level confidence interval for 
which the above condition is replaced by 


Pr{g(X, , =e » Xn) 2 wh, “** 5 Mn = h(X,, pvt » Xn)} = B. 


In Section 5.3 we establish the nonexistence of exact 8-level confidence in- 
tervals among pairs of functions (h, g) satisfying several moderate and reason- 
able restrictions; these restrictions are: 


ASSUMPTION 4.1. The functions g(x, , -+- , 2n) and h(a, +--+ , Xn) satisfy the * 
equations 


g(x, + 6, -++ , an + 8) 
h(a, + 6, +++ , an +8 


g(t, °** ,2n) +4, 
h(i, °++ ,2%n) +8 


for all x, +++, 2n, 6. 
ASSUMPTION 4.2. The equation 


g(a , , an » Zn) - —h(-n, oe —Z,) 
hoids for all x1, +++ ,2n. 
ASSUMPTION 4.3. The functions g(a, +++, 2n) and h(x, +++ , 2,) are sym- 


metric functions. 
ASSUMPTION 4.4. If x; = max 2; , then the function g(a, --- , 2n) satisfies 


g(a , ee » Tn) S g(x, ries » 01,2; +4, Fin, ie » Zn) 


for any positive 6. 

AssumPTION 4.5. For all 1, -°-+,2%n,9(%1, °°: ,%n) Z2E+ €,, where 
# = Doz;/n and «, > 0, may depend on g but not on %1,°-** 52. 

As a corollary to Theorem 3 we obtain a confidence interval for the means 
which has at least 8 confidence; it is (4.1) (A, g) = (min 2; — Nga—s) ¢, max x; + 
Nia-s) ¢) where N, is the a point of the unit normal. Also in section 5 we indi- 
cate the procedure for constructing intervals having at least 8 confidence. 
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5. Analysis of Problem B. 


5.1. Justification of assumptions. The first three assumptions (4.1, 4.2 and 4.3) 


are obtained by applying the principle of cogredience to the problem. The set 
of transformations 


t,=2,+C,i=1,---,n, C « R' 


produces the conditions contained in Assumption 4.1. Similarly the transforma- 
tions 


a, = —%,t=1,-+-,n, 

a 
for all permutations (j;, --- , jn) of (1, --- , ) produce respectively the con- 
ditions of Assumptions 4.2 and 4.3. 

Assumption 4.4 is similar in form and justification to Assumption 2.2. Assump- 
tion 4.5 is not too restrictive for practical confidence intervals: it is introduced 
merely from necessity in the proof. Nevertheless, it seems reasonable to suppose 
that Assumption 4.5 is not essential for the conclusions of the theorem. 

5.2. Characteristic functions. A characteristic function similar to (3.1) could 
be defined for the interval (h, g). However, the symmetry introduced by 
Assumption 4.2 enables us to use the characteristic function (3.1); for 
g(a, ***,2n) in (h, g) we define go(x, , --- , Zn) as in (3.1). 

The present assumptions yield for ¢e(x; , --- , Zn) the properties derived in 
Section 3.1, namely, 

(1) ge(a, + 6, --- , 2, + 6) is monotone nondecreasing as a function of 6, and 

(2) for pomts (ei, «>< , Se) €: Big Peleg *** 5 City Bey Vira, *** » Sa) OO 
monotone nondecreasing function of 2; . 

5.3. Theorem for normal distributions. To establish the nonexistence of exact 
B-level confidence intervals satisfying Assumptions 4.1 to 4.5, we have 

THEOREM 3. If X,, --- , Xn are normally and independently distributed with 
means 4, °** , fn and variance o’, there does not exist (unless 8 = 0, 1 orn = 1) 
a pair of measurable functions (g, h) which satisfies Assumptions 4.1, 4.2, 4.3, 4.4, 
4.5 and which is an exact B-level confidence interval for the set {u,--- , un}. 

Proor: The proof is somewhat different from that used in Theorem 1, but 
several results obtained in the course of that proof are used here. 

Let o” = 1 without loss of generality. We consider a pair of functions (h, g) 
satisfying Assumptions 4.1 to 4.5, and, assuming (A, g) is an exact 6 level con- 
fidence interval, we shall find that a contradiction results. 

For the characteristic function ¢(z1 , --- , 2) define according to (3.4) a 
conditional expectation 8)°(x,, --- , 2-1, Liat, *** » In- In the following ex- 
pressions we shall use a symmetric multivariate normal distribution with vari- 
ance 1 and mean given after the condition bars. Using Assumption 4.1, we have 


B = Pr{g(X, ee Xn 2 0, h(X, “+ On—-1 noe Xn + 6n-1) 


S 0| (6, — 1, +++, — On-1)} 
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if we let 0 S 6, S --- S 6,1. Using Assumptions 4.1, 4.2, 4.3, 
1— B = E{(1 — Go(X1, --- ,Xn)) | 0, — 1, +++ , — On-1)} 
+ E{(l — (Xi — Ona, +++, Xn — On-1)) | (0, 1, -*+ » Onr)} 
— E{(l — oo(Xi, --- ,Xa)) (1 — bo(— Xi — Ona, +++, — Xn — On-1)) 
| (0, — 01, °** , — O1)} 
= E{(1 — g(Xi, ---, Xa) | 0, — 1, ---, — On1)} 
+ E{(1 — G(X, +--+ , Xn) 
| (—@n-1, — (Ona — 0), -** » — (Ont — On-2), 0)} 
— EX(l — (Xi, +++ , Xa)) (1 — Gol — Xi — Ona, +++, —Xn — On-1)) 
| (0, — 1, -+- , — O-1)}. 


If we restrict the values of the 6’s it is possible to make the third term on the 
right hand side of the equation equal to zero. From Assumption 4.5 the first 
factor 1 — ¢o(21, --- , Zn) is equal to zero if g(z,, --- , 22) 2 OorifF+e«6 20. 
Similarly the second factor 1 — ¢o(—21 — On-1,°** , — In — 9,1) is equal to 
zero, if —z; — 0,-. + « 2 O or if + «, S 2e, — 6,1. The product of the 


two factor will certainly be zero if 6,1 < 2 «, . Therefore we have 
1— B = E{(1 — goa, --- , tn)) | O, — A, -*- , — On-r)} 
+ E{(1 — do(ti, --- ,%n)) | O, — (On-1 — Ons), -** » — (On-1 — 61))} 


for all 6 ,--- , 0,-; satisfying O S 6, S +--+ S On < 2e,. 
We now derive a property of the conditional expectation 86" (x2, --- , 2) = 
B(t2, +++, In). We have 


1 ; \{s 2 
” -* sain | — B(x, cae 7) exp| -34 ; (2x; + 0a Jae ates dx, 
1 
aie [a = Beas, ++, 20) 


fn—l1 : > 
* exp | -3: D (xi + On1 — Gi)” + (tn + 0° | dz, --- dz,. 
ats } 


We note that the following functions satisfy the conditions for a pdf in R"™ : 
1 f i< 2 
fi = 5797\i@—D \eP 3 2 (xi + O:-1) 
~ 2 


n—1 
+ exp / | -} D (xi + Ona — O13) — Han + On — v'}, 
~ 2 
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For the integral 


fi dx eee 


Rel 


the conditions are satisfied for differentiating any number of times under the 
sign of integration with respect to 4, ,-- 


> 6,1. If we set 6; =e = On-1 = ( 
in the equation 


o”2t ts 
i: oom dx cee dz, — 0, 


we obtain equations from which all the moments of f; (with all 6’s equal zero) 
can be obtained. However, the equations do not depend on 7; hence f; and fe 
have identical moments. But for these multivariate normal moments, the density 


corresponding to them is unique (generalization of moment condition on p. 176 
in [2]); therefore, 


1 — B(t2,+++ , tn) = 3(1 — 8B) 


B(z2,-++,2n) = ¥(1 + B) = B*. 


Now if we use part of the proof of Theorem 1 from formula (3.5) to formula 
(3.12), we obtain 


g(t, +++, 2%) = max 2; + Nis, 


max 2; + Nja-s) , 
and by Assumption 4.2, we have 


Rh(xai, +++ 5 Ln) min x; — Nya» . 
Therefore 


(hg) = (min 2; — Nya—s) , max x; + Nya-s)). 


It is easily seen that for this interval the confidence level is greater than 8 (unless 


8 = 0, 1, or n = 1). Since this is a contradiction the theorem is proved. 


5.4. Example of normal confidence intervals. Intervals having at least 8B con- 
fidence do exist; for example 


(min x; — oNya_s) , max 2; + oNia_s)). 


The confidence level for this interval is always larger than 8 and it seems reason- 
able to expect that it is bounded away from 8. In other words the above interval 


might be refined by using a constant smaller than Njq—s) . The answer to this 
question will most likely be obtained only by applying a numerical procedure 
analogous to that described at the end of Section 3.4. 
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Any bounds for Problem A can be used to provide an interval for the present 
problem. Let 1 — 8 = a; + a2 where a; and a: are positive, and let g:(% ,--* ,2n), 


G2(a, +++ , tn) be at least 1 — a , 1 — a2 confidence bounds for problem A. Then 
an interval having at least 8 confidence is 


(—go(—21, +++ 5 — Ln), Gilt, *** » Zn) 


This follows from the argument at the beginning of the proof of Theorem 3. 
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SEQUENTIAL MINIMAX ESTIMATION FOR THE RECTANGULAR 
DISTRIBUTION WITH UNKNOWN RANGE! 


By J. Krerer 
Cornell University 


1. Summary. This paper is concerned with sequential minimax estimation of 
the parameter 6(0 < 6 < «) of the density function (3.1) when the observations 
are independently and identically distributed with this density, each observa- 
tion costs the same amount c > 0, andthe weight function is as given in Section 


2. A procedure requiring a fixed sample size is shown to be a minimax solution 
for this problem. 


2. Introduction. An important problem in the theory of statistical decision 
functions’ is that of minimax sequential estimation of the parameter of an 
(unknown) member of a given family of distribution functions when the obser- 
vations are taken on chance variables which are independently and identically 
distributed and when the cost of taking n observations is cn (with c > 0) regard- 
less of the way in which they are taken. This problem was solved for the case of 
point estimation of the mean of the rectangular distribution from @ — 3} to 
6+34(—x <6 < o), for weight function W(@,d) = (@ — d)” by Wald [1]; 
the minimax sequential estimation problem for the norma! distribution was 
solved for a variety of terminal decision spaces and weight functions by Wolfo- 
witz [2] (see also [3]); certain extensions and modifications of the results of both 
of these cases were given by Blyth [4]. 

The present paper is devoted to a problem of sequential minimax estimation 
for the case where the family of possible distribution functions consists of all 
distributions for which the successive observations are independently and 
identically distributed with rectangular density function from 0 to 6 (equation 
(3.1)) for 0g Q = {0|0 < @ < «} and where the cost of taking n observations 
is en(c > 0) regardless of the way in which the observations are taken. The 
object is to estimate 6, the terminal decision space being D = {d|O0 Sd < ~}. 
The weight function is W(6, d) = [(@ — d)/6}’; i.e., the loss incurred by making 
decision d when @ is the true parameter is the square of the fractional error in 
estimating 6. Thus, the minimax problem considered in this paper is that of 
finding a sequential estimation procedure which minimizes supe{cHs(n) + 
E,{(@ — d)/6\'\. A word is in order concerning our choice of weight function. 
The reason we do not study the problem for such weight functions as 
|@—d\, (@ — ad)’, or [(@ — d)*/6] is that for such weight functions the 
supremum of the risk over all @ € Q is infinite for every decision function, so that 


1 Research under a contract with the Office of Naval Research. 


2 See Wald [1] for an exposition of this theory and an explanation of the nomenclature 
used herein. 
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every decision function is minimax. In addition, weight functions which depend 
only on d/@ (such as [(@ — d)/6]”) have a structure which essentially simplifies 
matters when estimating a scale parameter. On the other hand, it does not seem 
convenient in the present case to consider simultaneously a large class of weight 
functions as was possible in the cases of symmetrical densities studied in [2] and 
[4]. We therefore treat only one typical weight function here, noting that the 
same method should be applicable to many others. 

With Q, D, and W(6, d) as described above, we shall prove that there is a 
minimax solution for which a fixed number of observations is taken. Specifically, 
the function r(m) of (3.20) (which is the constant risk corresponding to taking 
a sample of fixed sample size m and then estimating @ by the expression of (2.1) 
with m for me) has at most two minima (if there are two, they are for successive 
values of m; moreover, there is only one minimum for all but a denumerable 
set of values of c). A minimax decision function is given by taking m» observa- 
tions 41, Ye, *** Ymo, Where r(mo) is the minimum of r(m) (if there are two 
minima, at mp and m, + 1, one may randomize in any way between the decisions 
to take mp or mp) + 1 observations); and by then estimating 6 by 


. my + 2 
(2.1) m+ 1 


max (Yi, °** 5 Ymo) 


if m > O (we replace m by m) + 1 throughout (2.1) if the latter number of 
observations is taken when there are two minima), and by 0 if m) = 0. The 
risk corresponding to this decision function is then r(mp) for all values of 6 ¢ Q. 
It follows, incidentally, that this decision function is uniformly best among all 


cogredient procedures (see [4]). It is also a minimax solution for some related 
problems discussed in Section 3 of [4]. 

The method of proof is to calculate a lower bound on the Bayes risk when 
the a priori density on @ is given by (3.4). It follows from (3.24) that as the 
parameter a of (3.4) approaches zero, the corresponding Bayes risk approaches 
r(mo); hence, by an argument like that of [1], p. 167, the procedure described in 
the previous paragraph is a minimax solution. The lower bound (3.24) is caleu- 
lated in detail, since the necessary steps in its calculation differ somewhat from 
those of [1], [2], and [4]. We also note that, in this case of estimating a scale 
parameter, the tool used in [1], [2], and [4] of attempting to attain a ‘uniform 
a priori distribution on the real line” in the location parameter case is replaced 
by trying to attain the “a priori density” 1/6. The proof is somewhat shortened 
by restricting the positive range of \,(@) to values @ < 1. This asymmetry 
manifests itself in the fact that the estimator of (3.7) does not tend to a minimax 
solution as a —\0. ; 

The fact that A.(@) is positive only for @ < 1 also shows that the fixed sample 
procedure described above is minimax for the problem of estimating @ when 
the above setup is altered by making Q = {@|0 < 6 < b}, where0 <b < om: 
the argument of Section 3 shows this for b = 1, and the result for general b = b’ 
follows immediately from the case b = 1 if one considers there the problem of 
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estimating b’@ from the sequence {b’Y;} of chance variables. Similarly, by 
considering for each value of a in Section 3 the problem of estimating baé from 
the sequence {baY;}, one sees that our fixed sample procedure is also minimax 
for the problem of estimating @ when our original setup is altered by making 
Q = {0|b <6 < x}. However, the given procedure is obviously not admis- 
sible if mp > O (or m 2 O in the second case): for example, a trivially better 
procedure in the first case when m > 0 is to estimate @ by b whenever the 
expression of (2.1) is > b. 

Finally, we remark that the problem of estimating 6 for the case where the 
f(y; 9) of (3.1) is replaced by 1/(26) for —@ < y < @, is obviously identical to 
the one we consider: one has only to note that after n observations a sufficient 
statistic is still given by (3.2) if only Y; is replaced by | Y; | fori = 1, --- ,n. 
It is also of interest to note that our problem may be translated (by considering 
T; = e€ "',@ =e’) into that of sequential minimax estimation of the parameter 
¢ of the density e ““® for t > ¢, 0 otherwise (— «© < o < «), when the weight 
function is W(¢, d) = (1 — e “®)?. 


3. Calculations. For brevity, we shall throughout this section state the values 
of density functions and discrete probability functions only over the domains 


where they are positive. Let Yi, Y2, --- be a sequence of independently and 
identically distributed chance variables, each with density function 

(3.1) f(y; 0) = 1/0 O<y <8, 
where 0€ 2 = {0|0 < 0 < =}. Define 

(3.2) A, = wiex TY7 + , Fi. 

Clearly, if observations y,, --- , yx on Y,, --- , Y, are taken, then X, is a 
sufficient statistic for @; i.e., for any a priori probability distribution on Q, the 
a posteriori distribution of @ depends on y,, --+ , y, only through the value 


x, taken on by X,.. Thus, in constructing sequential Bayes solutions, we may 
restrict ourselves to decision functions for which the (perhaps randomized) rule 
for stopping and estimation depends, after n observations, only on z,. The 
density function of X, is given by 


n—1 
(3.3) gn(x; 6) = ee 0<2< 8. 
For 0 < a < 1, we define 
1 1 
(3.4) (6) = ——_——_- -,, <0 s. 
) log (1/a) 6 
If \,(@) is the a priori density function on Q and y;, --- , yn have been observed, 
the a posteriori density of 6 given that X, = z is easily computed to be 
(3.5) h.(0|X, = 2) o —_. 2<0 <1, 


T- 2 


where z = max (a, z) and we note that P{z < 1} = 1. 
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The a posteriori loss (excluding cost of experimentation) if one stops after n 
observations and uses d to estimate @, is 


1 2 
W*(d, 2) = [ (24>) ha(0| Xn = 2) ab 


tas a a | AR ) - (2 ai )| 
2(1 — 2”) n+1 n+2 n+l n+2/]° 


The unique minimum of W*% with respect to d is Pra seen to occur for 


(3.6) 


7 _n+2 1-2" y 
_ ‘ek. ae 2 


the corresponding value of W*, being 


oa n(n + 2) (1 — 2**")? 
(3.8) W,(z)=1- in +1? G-= 2)" 
For n = 0, the integral in (3.6) must be altered by replacing h, by A, ; the final 
expression must be changed accordingly. Equation (3.7) then holds with z = a 
and (3.8) becomes 1 — 2(1 — a)/{(1 + a) log (1/a)}. 

Next we note that when f(y; @) is the density of each Y,, the conditional 
distribution function of X, given that X,.. = wu assigns probability mass u/@ 
at the point zc = wu and density 1/6 for u < x < 6. For n = 1, the distribution 
of X, is of course given by the density f(z; 6). We conclude that if \,(6) is the 
a priori density on Q, the distribution of X, is given by the density 


a l (la) 
| log (1/a) a 


1 a-2) 
\log (1/a) 2 


’ 


ifz Sa, 


1 
(3.9) pi(x) = [ f(x; @)r0(0) dé = 


fa<z <i; 


and that (using (3.5) with n replaced by n — 1), for n > 1, the conditional 
distribution of X, given that \,(@) is the a priori density and X,.. = uw, is given, 
if u S a, by 


(3.10) 


pa(z | u) 


and, if u > a, by 


(3.11) 
“o< 2-€ is 


ie eS a on ae 
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where in each case P, is the probability mass at r = u and p,(z | u) is the 
density elsewhere. 
Equations (3.10) and (3.11) yield for the conditional distribution of Z, = 


max (X,, a) given that \,(@) is the a priori density and that Z,_, = », for all 
n> i, 





Eiteg « 22 i ¢ 


ae mn i—»’ 
nen n—1 inas 
Qn(z Oe: peg * es <2 <i, 


where again q, is a density and Q, is the probability mass at z = v. 

Let W,-:(v) be the conditional expected value of W%*(Z,) given that \.(6) 
is the a priori density and that Z,-. = v (where we define Z) = a). Using (3.8) 
and (3.9), we have 


W.(a) = E{Wt*(Z,)} 
a 1 
ai wi*(a) | pi(z) dz + [ Wt" (z)p,(z) dz 
0 a 


(3.13a) Lic sed* 3 (i-a@), pa-2y \ 
Tee 7a a =) ‘Lai-s 


3 Ty ia. 
t-roa f [$-1+ ES ]e 


sale hates 1 
3 4 log (1/a) oe a (1 a) | ee log (1/a) ° 
For,n > 1, we have from (3.8) and (3.12), 

W-i(v) = E{W%*(Z,) | v} 


A 


1 
W2*(W)Qa(Z = 0) + [ W2*C2)qale | v) ae 


my (n — 1)(n + 2) 


@ + 1d =) 


(f sey +" J, 2a — 2) ae} 
The term in the last set of braces in (3.13b) may be written as 


(3.13b) 


1-7 4+ (1 — vil +o —v — yp?) 


(1 — v™*) 
1 ‘ wae n+1 
+" I [4 . Ga] dz 
(3.14) 7 
> (1 —v™") + —_ ; (li —v*") — of 22 dz 
n 





(1 — vy") — wo "(11 — v’). 
n— 1 
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Wav) <1 — 2 +2) 5 @- Vet 2) | v(1 — 0) 


aia (n + 1) @ +1) =») 


l l 
(w+ i+ fog)’ 


< 


where in the last step we have used the fact that (n — 1)(n + 2)(n + 1) < } 
if n = 2 and < 1 otherwise, that (1 — v*)(1 — v"")" < 2ifn = 2and Ss 1 
otherwise, and that if n > 1 we have v™ S v < (log 1/v)~*. From (3.13a) and 
(3.15), we have for all n > 0, 


f 1 1 
(3.16) Wav) < Gait lax tie)” 


Similarly, we have from (3.8) for n > 1, 


ey) _ (n—1(n+1). (1 — v*) 
Fe ot Ss a 


~ 1 —- @—VOt+D E + | 


ae 
n? log (1/v)’ 


and, for n = 1 (putting v = a), 


4 2(1 — a) 2 
3.18) fe Re eS ee 
( he (I+ a) log (17a) log (17) 
Combining (3.16), (3.17), and (3.18), we have for all m 2 0, 


C 7e* _ ’ an a — — cr. Fae 
sii We (0) — Wal) > CF tym +O) log Oe)” 


We now define, for all integers m = 0, 


1 
(m + 1)?" 


We note that r(m + 1) — r(m) = ¢ — (2m + 3)/((m + 1)°(m + 2)*). The 
function r(m) evidently has at most two minima (if there are two, they are for 
consecutive values of m). Denote by mp» the first integer for which r(m) is a 
minimum. Let e (0 < ¢ < 1) be such that 3e < r(m — 1) — r(m) (if m = 0, 
the last restriction is omitted). Let d = e“‘ anda = oe Let m, be the 


smallest integer not less than 1/c. 


(3.20) r(m) = cm + 
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For any integer K > 0, if \,(@) is the a priori density we have (noting that 


d> a) | 
[ [ gu(x; ar (8) do ie — [ -~ a” 
7 cree a x log (1/a) 


(3.21) P{Xn 2d} 


= log (1/d) ae -a 
log (1/a) K log (1/a) 


We note that, after m observations (m = 0, 1, --- , ad inf. and putting v = a 


€. 


if m = 0), any Bayes solution will certainly prescribe taking another observation 
if W3,*(v) — W,.(v) — ¢ > 0, since this quantity is the a posteriori expected saving 
over stopping after m observations if instead one takes one additional observa- 
tion and then stops and makes the best terminal decision. 

We also note that, since (log 1/a)" = € < «, it follows from (3.21) that, 
when A,(@) is the a priori density, 
P ! en, i eee i 2 ei deamaaiiteatin ak ok 
\ log (1/Z,) <efort = 1, 2, » M ™) = \log fi Bccea,) € 


/ 


( 1 \ 
— ee = 3 m m ' _— . 
Py jog (1/Xagua,) <P = PiXmorm <a} > 1 - 
Since r(m — 1) — r(m) is a decreasing function of m(m > 0) and since 
3€ < r(m) — 1) — r(m), we conclude that, if m > 0, the event 
1 


(3.23) log (1/Zmo+m;) = 
entails the event (log (1/Zm,-1))' < ¢, which entails 3(log (1/Zm -1)) < 
r(my — 1) — r(mo); or, equivalently, —3(log (1/Z,))* + r(i) — ri +1) > 0 
fori = 0,1, --- , mp — 1. Finally, it follows from (3.19) that this entails the 
event Wi*(v) — Wi(v) — c > Ofori = 0,1, --- , mo — 1; and, for any Bayes 
solution relative to \,(@), this entails the event that at least mp observations 
will be taken. Furthermore, the last statement is always true for m) = 0. 

Similarly, we note from (3.17) and (3.18) that the event (3.23) certainly 
entails the event Wi(v) > (1/(1 + 1i)*) — 2e fori = m,m +1, °°: ,m+ 
m,. That is, if a terminal decision is made after exactly i observations (¢ = m, 

- , Mo + m,), the total a posteriori loss plus cost of experimentation will be 
> ci + (1/(1 + 1)”) — 2e = cm + (1/(1 + m)*) — 2e. Moreover, it follows 
from the definition of m, that this last expression is less than the cost of experi- 
mentation alone if more than mp -+ m, observations are taken. 

To summarize, then, the event (3.23) implies for any Bayes solution relative 
to \.(@) that the experiment will terminate with a total a posteriori loss plus 
cost of experimentation exceeding cm) + (1/(1 + mo)*’) — 2e. But it follows 


from (3.22) that (3.23) occurs with probability >1 — «¢. Since me + 
(1/(m + 1)°) S 1, it follows that the Bayes risk relative to \,(@) exceeds 
1 1 
9 ‘it ~~ idiot se Mie 
(3.24) (l-® (moe + me De 2.) > met+ (m+ 1)? 3e 
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Since « may be taken to be arbitrarily small in magnitude, we conclude (see 
Section 2) that the fixed sample procedure described in Section 2 is indeed 
minimax. 
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EXTENSION OF A METHOD OF INVESTIGATING THE PROPERTIES 
OF ANALYSIS OF VARIANCE TESTS TO THE CASE OF RANDOM 
AND MIXED MODELS 


By F. N. Davin ann N. L. JoHNSON 


University College, London 


Summary. Results are given whereby the methods described in an earlier 
paper [1], dealing with the parametric case, may be applied also to the case of 
random, or mixed random and parametric components. 


1. Introduction. In a recent paper [1] we set out a method for approximating 
to the power function of tests of the general linear hypothesis under fairly wide 
conditions of non-normality and non-uniformity of residual variance. In many 
analysis of variance problems, it is more reasonable to replace some or all of the 
parameters by independent random variables with zero expected value. (This 
is the basis of the well-known ‘components of variance’ model.) 

In the present paper we give certain general formulae which will facilitate 
the application of the method described in [1] to such random or mixed models. 
Our results are presented in such a form that they refer to the various sums of 
squares suggested by the analysis appropriate to the parametric case. Since, 
however, the same sums of squares are commonly used (though not necessarily 
in the same way) in the analysis when a random or mixed model is envisaged, 
the results given will be appropriate in such cases, though care must be taken 
in their interpretation. 

It may be noted that this extension of our method covers the case of the 
general linear hypothesis with correlated residuals, since such residuals may be 
represented as the sum of : 

(i) independent residuals for each observation, and 

(ii) independent random terms common to different observations (i.e., occur- 

ring in the same way as do parameters in the general linear model). 


2. The theoretical model. In [1] we used a theoretical model of the form 
i= An, + --> + Q;,s—p9.—p + Qy,s—p+19.— p41 + -** + a0, + 2; 
(i = 1, ho , 2), 
where the 6’s were unknown parameters and the z’s were independent random 


variables each with zero expected value. The hypotbesis to be tested specified 
that p41 = +: = 6 = 0. 


We now replace 6,41, --* , %-p, O-pirti, **: » (QQ < 8 — p,r < p) by 
independent random variables ygii, -** 5 Ye—p» Ys—ptrtt, *°* » Ys (each with 
expected value zero) so that the theoretical model is of form 


= anh, + ++ + Digg + Qi a+iYor+i + “9° Qi ,.s—pYs—p 
+ Gj,.—p+19.-p41 t+ ++ + Ay,e-pirDe—pir 


+ Ay s—p4r41Ye—pirdl + eer + AisYs + Zi. 
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The hypothesis to be tested specifies 


O-rr1 = *°* = Oper = O, 
O(Ye—pirgs) = -*: = ofy) = 0. 
As in [1] it is also assumed that the matrix A = (a;;) is nonsingular and the 


z’s are mutually independent. We further assume that the y’s are independent 
of the z’s. 


3. Method of investigation. It will be recalled that in the parametric case the 


test of the hypothesis H(@,»41 = --: = 6 = 0) was based on the criterion 
(S,/p)/(Sa/(n — 8)), where S, is the minimum value of 
Doles (ti — Gah — «+: — ae6,) 
with respect to 6, --- , 6,; and S, + S, is the minimum value of 
> far (2s — aa — +++ — Oey») 
with respect to 6, --- , &—». The upper 100a% limit of the test criterion 


could be obtained from tables of significance limits of the F-distribution. 
The test could formally be expressed as 


reject H if (S,/p)/(Sa/(n — 8)) > Fopn—s.a- 
Investigation of properties of the test reduces to evaluation of the probability 
P{(Se/p)/(Sa/(n — 8)) > Fp,n-s.a} 
which can be written in the form 
P{S, — CS, > 0}, 


where C = 1 + pF 5.n-s,2/(n — 8s) and S, = S, + S,. This probability is ob- 
tained approximately by finding a frequency curve which has the same first 
four moments as S, — CS,. It is assumed that the theoretical model (1) is 
adequate in the number of parameters and/or random variables which it contains. 
Following our previous work it may be shown that S, and S, may be written 
in canonical form as 


n 
S. = z Mij 21 2; 


ij=3 
and 
S, = ZZ mij (z; + D(z; + D}), 
t,j=1 
where 
e—pt+r s 
Dz = , > i249, + » Ss Git Ye 
t=e— p+1 t=s—ptr+l 
= A;+Y, (¢ = 1,---,n) 


and the m’s and m’’s depend only on the a’s (see Section 4). 
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4. Definition of determinants. Before turning to a consideration of the moments 
it will be convenient to summarise in determinantal form the various quantities 


which are required. 
As before 


n 
Cx — = yj Aik 


i=l 
and 
Gu ++: Ge Gu -** Gis-p 
A= “ A’ = : 
G4, * Ges Gie—p Paid Ci—p, —p 
Let 
1 ain Cia<» 0 ain Gi op 
i ain Gu Gi,.—» aj Gy eee Gi 2p 
Ai = : ai; = ‘ 
Gis—p Gis—p *** Gr_pa—p Qj s—p Gie-p *** Gi_pa—p | 
Then 
mij = —ai/A’ t# J; 


mi = 1 — ay/d! = Ay/d’. 


Similar quantities without primes may be expressed as similar determinants of 
order (s + 1) instead of (s — p + 1). In this present work we shall also use 


0 D ai Ay: De Gi e-p Ai| 


| Gy Cisne = , 
i=; = » es mi; Ai, 
A I=1 
Vis-p Gia—p (s—-p.e—p 
2 
>, A} >, 8: A; -*- 2, Gop lle 
' . ‘ 
1 a ay Ay Gu Gi s—p n 
/ . ‘ 
i = = mis Ay Aj. 
A | . . t,j=1 

a Ginph: Gis» Gectitnn 
| s 


Similar quantities without primes which may be expressed as determinants of 
order (s + 1) will have zero value. So far the determinants are the same or are 
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directly comparable with those of our previous paper. We now introduce new 
determinants and note in these definitions that ¢ and u may run from 
(s — p+r+ 1) tos only. We define 


Geu Gy eae Gia ait ai ve? “Qs 
" 1 Gru Gu — Gis» ’ 1 Gu Gy .** Gijs—p 
lw = a : . . ’ Qiu = y ° ° . ’ 

Go», Gis» ae 1G». Gis» rr Ge-pe—p| 

Zz; ait Aj = Qi, A; °°: Pi ile 
' 1 Git Gu aac Gi.» 
At — 
A . . 0 


G,~». Cs acm aan Go~p.0~p 


Similar determinants of order (s + 1) may be written down to represent 
quantities without primes but these will be zero. 


5. Moments of S, and S,. We write 
u(S: St) = &[(S, — &(S,))'(Sa — &(S.))"] 


with «(S/S%) for the corresponding cumulants. It is easy to see that the moments 
of S, are the same as those indicated in [1] with the appropriate determinants 
now put equal to zero. For example, (all summations running from 1 to n) 


&(Sa) = Qe mis nei 
K(Sa) = De misna + 22 x Mig Koi Key y 
«(Sa) = a mii kei + 12 X x mii Mi; Kai Koj + 6 X x Mii Mj M55 Kai Ka; 
+4 2 2X mi; ksi kaj + 8 x 2X 2 Mij Mit M51 Koi Koj Ku 


and so on, the rth cumulant of z; being defined as «,; for r 2 2. Again it is a 
simple matter to show that «(S,S.) is the same under this treatment as it was 
in [1] if the appropriate changes are made in the determinants. Thus 


«(S,S,) = bo mis Mis kai + 2 ) = i mis Mis Kai Kay + 2 p & mii 55 kai , 
i : J ‘ 


where 6; has A’s instead of D’s in its definition. The moments of S, and the 
cross cumulants of S, and S, containing a power of S, greater than or equal to 
2 can be derived by elementary algebra or by a simple combinatorial method 
from the moments of S, previously obtained. Let x,, be the rth cumulant of y; . 
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We have then 
&(S,) = Do misuse: + Aa + DL Vie kee, 
. t 


where ¢ may run from (s — p + r + 1) tos only. Again 
x(S?) = > mika + 2 > 7 M5 Kai Kaj + 4 te mis bik + 4 Zz 6, Kai 
t ‘ i ‘ ‘ 


+4 2 Vii ke +2 x D Virkukeu + 4 a DicAekse 
+4 ps AG iene + 4 y > Qi Kae Kas” 
t . t 


This last expression demonstrates how the moments of S, can be obtained 
directly by substitution from [1]. We write down the expression for «(S;) from 
[1] and add to it expressions in ¢, or in ¢ and u, which we obtain by substituting 
Kre fOr kei, Ute for mj; , and so on. We add further the terms in &,.«,; by making 
the appropriate substitution for the cumulants and writing Q;, for m‘{;. This 
combinatorial method is obvious if the form of the various determinants is 
considered. We have worked out the cumulants and cross-cumulants up to and 
including those of the fourth order by two different methods but they are so 


easily derived by the above process that we do not reproduce them here in full 
generality. 


6. Special cases of normality. If it is assumed that z; and y, are both normally 
distributed then from a knowledge of the moments it is possible to study the 


effect of heterogeneity of variance on the power function of the test. For the 
special case of normality we have 


&(S_) = 2 Miikai , 
&(S,) = D micas + Ad + De Vetkee, 
x(S3) = 2 a Mis jK2iK25 5 
x(SaS-) = 2 - Mis {Mi jKaiK9 . 
x(S?2) = 2 D> mijneines + 4 Do 877m + 2 Do Vevkerveu + 4D) Adie 


x(Sa) 


8 Yo Mj j{M41M j1k2iK2jK2 , 

«(S,S3) = § Z. My MAM jrraiko;jK2A ’ 

x(S2S,) = § = Mi {M5 AM jrKasK2jK2t +8 ie Mj 35,5 jKaiK2; +8 ~ Mj iD jekaike Kae ’ 
x(S3) = 8 t Mi MUM Kako ;K21 + 8 dD Veal tol cokastaukes 


, Poi ™ Pie , ito ‘ r oft 
+ 24 > Mj jQ ¢Qj Karke jKee + 24 ae OQ; QiuT tukerKerkoun + 24 2 ™M ; 70 40 jK2iK2; 


+ 48 >) Did: Aiwostnr + 24 D> MivAtAuackas , 








PROPERTIES OF ANALYSIS OF VARIANCE TESTS 


od 
x(S,) = 48 7 Mj j{Mj1M jxM uck2iKe jKaiKex » 


x«(S,S3) 48 2: IM; MIM EM Uekaik2 jKaIKER 
«(S3S2) = 48 >> MijsMiM AMudaike ikon + 48 bs MM Qj QD iekoiKa jKarkoe 
+ 32 > Mj 18 48 1K aiK2 KI ’ 
«(S?S,) = 48 z: Mi MM AM ukeike Koike + 96 ~ Mis iD}: Diekoixe karat 
+ 48 ec Mj Vi QiuT tuk 2ikejKarkou + 96 ze Mj jM 1d 58 1KaiKajKat 
+ 96 > mid, QVieA tKaskajKee : 
«(S?) = 48 ye Mi {MAM AM pEK9iK9 jKoIKo + 192 a Mi jM jij: QtekoiKa jKarKe 
+ 288 Do mi Qi QiuT iwrrine sReevou + 192 Dy Vi QiwT oT wokrikerRouer 
+ 48 Do PP eT uwT rwketkovdeetaw + 192 D2 mi jmind 5 woina5xo1 
+ 192 D> 05:5: jxoinejKee + 384 D. mj jQi5jA rwoinasiee 
+ 192 Do DM QiAA Ueitorten + 384 Do VET L5iA unsikerkow 
+ 192 Do Pia teAuAckerkeuier « 
For ease of printing each summation sign stands for one, two, three or four 
separate summations as required by the subscripts. In these summations 7, j, 
land k run from 1 to n, t, u, v and w run from (s — p + r + 1) tos. A further 
simplification will be to let 5; be zero and the summations for ¢, u, v and w run 


from (s — p) to s. In this latter case the alternative hypotheses to that tested 
specify the existence of certain random variables but not any parameters. 


7. Special cases of correlated variables. As an illustration of the use of the 
foregoing theory when the variables are correlated we consider the test for the 
linearity of regression in a bivariate table. The standard case where departure 
from linearity is represented by parameters was studied in [1]. It will now be 
supposed that the deviations from linearity form a simple moving average 
series of random variables. Let z,; be the dependent variable and W, the inde- 
pendent variable (i = 1, --- ,m;;¢ = 1, --+ , 8). We suppose that the model is 


Zu = 0: + (Wi — W) 6. + yr + Ryp_, + 2, 


where 7’ = ¢t + 2 and R is a known constant. We shall assume that x,-(z:;) = Kye 
(i.e., the distribution of z,; depends only on the array). The fundamental sums 
of the squares are 


S. = 2. 2 (tu — z,)°, S, = Zz Zz {tu — 2. — b(W, a W)}’, 
t ‘ t ‘ 


where 


¥ n(W, — W)(%. — 2.) 
~" aio, 


b 
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Evaluation of the determinants gives 


Poo = nN + R’nisi _ N7(n: “ Rnias)’ ~— (> nw:) (new, > Rrezse41)’, 


where w, = W, — W and n,4; = 0. We have, therefore, using the determinants 
a;; Which have been worked out in [1], 


1 wi 
S,) = _-—=- poet 
&( ) Z Ne (1 N > mh =) Kee 


* < E + R Ni+i — ee + Bed! = amt Revs Wes) |i 


with the convention that n.4: = 0. Again it may be shown that 


(me + Rrregs) (Megs + ress) 
N 


(me We + Rrress Wey1) (Meg Wear + Rrege Wise) 


> m wi 


att 
Pera = Rng — 


and 


rv = 


_ (me + Rrega) (Me + Rrrugr) (Me We + Regs Wiy1) (Mu Wu + Rrruys Watt) 
N a4 Ne wi; 

where U = u+2(u=1,--- s)and|T— U| =|t—u| > 1. Alsoifin 

terms‘of_our original notation (Sections 2-6) 7 is in the éth group, 


Q 1 Me + Ruz: we(Ne We + Rnesi Wr41) 
_— N dm wi : 


if 7 is in the (¢ + 1)th group, 


we ie + Rriy, — Wee We + Roig wer) 


=a” 


© age ee > m wi : 


and if 7 is not in the éth or the (¢ + 1)th groups, 
of. a et Rte _ wee We + Renee West) 


ad _ pegs dm wi 


For brevity we write 
1 W: Wu 
ou" FV > m, wi’ 
(me + Rregs) (Mu + Brags) 4 (Mee + Regs Weer) (Mu Wu + Rruss Wass) 
Xtu Re ———— + > 2 , 
N dr wi 
Ne + Rniss 


w,( We + Rrnisr We41) 
y= My eee. 


N 7 Ne we 
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Then 


K(St) = D> nll — oet)’eee + Do nll — 2oudwne + 2 DO nerubiameeren 
+ ¥ (ne + R’nigs — xe)*Ker $2 D0 (me + R’nigs)(rs + R'negs — 2xus)¥e 
+4 D0 Rrigs(Rregs — 2xe041)Rerker4r + 2D xiumornae 
+ Do [me(l — Wrdwre + megR(L — WRYr)earer + DL maburruliter. 


The higher cumulants follow in a similar way. 
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SOME RELATIONS AMONG THE BLOCKS OF SYMMETRICAL GROUP 
DIVISIBLE DESIGNS 


By W. 8. Connor 
National Bureau of Standards! 


1. Summary. It is well known that if every pair of treatments in a symmetrical 
balanced incomplete block design occurs in \ blocks, then every two blocks of 
the design have \ treatments in common. In this paper it will be shown that 
a somewhat similar property holds for symmetrical group divisible designs. 
In the course of the investigation there will be introduced certain matrices which 
are of intrinsic interest. 


2. Introduction. Some of the combinatorial properties of group divisible incom- 
plete block designs were considered in’ [1]. Here we shall need the definition of 
group divisible designs and the three classes into which they fall. An incomplete 
block design with v treatments each replicated r times in b blocks of size k is 
said to be group divisible (GD) if the treatments can be divided into m groups, 
each with n treatments, so that the treatments belonging to the same group 
occur together in A; blocks and the treatments belonging to different groups 
occur together in As blocks, A; # 2 . The three exhaustive and mutually exclusive 
classes into which the GD designs fall are as follows: 


(a) Singular GD designs characterized by r — \; = 0; 
(b) Semi-regular GD designs characterized by r — \; > 0, rk — vA, = 0; and 
(ec) Regular GD designs characterized by r — \, > 0, rk — vd, > O. 


In this paper we shall study classes (b) and (c) for the symmetrical case, 
that is, the case when r = k, or equivalently, b = »v. 


3. The incidence and structural matrices. In [2] there was defined the structural 
matrix for balanced incomplete block designs. We now shall define the incidence 
matrix, and two structural matrices for GD designs. 

Let us consider first the incidence matrix of a GD design, 


fis eoveces 8] 
(3.1) N=]: 


where the rows represent treatments, the columns represent blocks, and n;; = 1 
or 0 according as the ith treatment does or does not occur in the jth block. 
From the conditions satisfied by the design it is easy to see that 


b 
(3.2) > 1; = 9 (¢ = 1,---,»), 
j=) 


? This work was begun while the author was at the University of North Carolina. 
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and 


b 
(3.3) Do mij May =. OF De, 
jul 


according as the ith and uth treatments (¢ * u) do belong or do not belong to 
the same group. 
Throughout the paper let us adopt the convention that the treatments 
n(w — 1) + 1, n(w — 1) + 2, --- , mw shall belong to the wth group (w = 1, 
- ,m). Then 


A B.:---B 
cen 
(3.4) NN’ = 
Die et 
where the elements of the nxn submatrix A are r in the principal diagonal and 


d, elsewhere, and the elements of the nan submatrix B are \. everywhere. Of 
course NN’ contains v = mn rows and columns. 

Now chouse any ¢ S b blocks of the design. Let the submatrix of N which 
corresponds to these ¢t blocks be denoted by No . Let s;, be the number of treat- 
ments common to the jth and uth chosen blocks (j, u = 1, 2, --- , t). Then the 
t X t symmetric matrix 


(3.5) St = NoNo = (8ju) 


is defined to be the intersection structural matrix of the t chosen blocks. The jth 
row or column of S; corresponds to the jth chosen block and the successive 
elements of the jth row or column give the number of treatments which this 
block has in common with the Ist, 2nd, --- , éth chosen blocks. 

We next shall consider another structural matrix. Let sj, denote the number 
of treatments from the wth group which blocks j and u have in common. Then 


(3.6) D s%u = Su; 


(3.7) > 83; = k. 


Now consider the matrix 
(3.8) 


and the product matrix 


(3.9) 
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where the element in the jth row and the uth column is the sum of products of 
the number of treatments which the jth chosen block and the uth chosen block 
contain from each group. We define S{ as the group structural matrix of the t 
chosen blocks. 


4. The characteristic matrix. We shall define an analogue of the characteristic 
matrix which was developed for balanced incomplete block designs in [2]. For 
the remainder of the paper, except for the last section, we shall restrict our 
attention to the regular GD designs. 

Let the columns of N be permuted so that the first ¢ columns correspond to the 
t chosen blocks. Then let the incidence matrix be extended by adjoining ¢ new 
rows, so that the elements of the jth adjoined row are zero, except for the 
jth which is unity. We thus get 


N 
(4.1) M= , 
I, 0 


where J, is the identity matrix of order ¢, and 0 is the ¢ X (b — ¢#) zero matrix. 
Then 


: NN’ No 
(4.2) N, Ni = ’ . 
No I; 
The evaluation of | N,N; | leads to 
(4.3) | MiNi | = (kyr — (rR — wre)" | Ce, 


where the typical element of C; is 


ll 


(4.4) cya = (rk — vd2)(rkd;, + ALK’) + (Ar — Ao) (ri > 84; 8eu — NDAs e), 


w=1 

where 6;, = (r — \: — k) or —8;,, according as j = u orj ¥ u. The matrix 
C, is defined as the characteristic matrix of the t chosen blocks. The jth row or the 
jth column of C, corresponds to the jth chosen block of the design. 

We observe that the characteristic matrix is related to the two structural 
matrices as is described in the following theorem. 

THEOREM 4.1. For the regular GD designs there exists a (1-1) correspondence 
among the elements of the intersection structural matrix S',, the group structural 
matrix S?, and the characteristic matrix C,. This correspondence is given by 


C, = rk(rk — wds)[(r — Ma) — Si] + rkQa — )S?P + k*(r — DEL, 


where E, is the singular t X t matrix all of whose elements are unity. 
For the particular case when r = k, the value of | N,N; | as given by (4.3) 
reduces to 


(4.5) | NN; | oe Or — dy) (P? — od) IC, 
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where the typical element of C; is 


(4.6) Ciu = 1'(r” — vd2)(5ju + Ae) + (Ad — Ao) ( > 8. — ms). 
w=l1 


We shall state an analogue of Theorem 3.1 of [2]. The proof is as for that 
theorem. 


THEOREM 4.2. If C; ts the characteristic matrix of any set of t blocks chosen 
from a regular GD design with parameters v, b, r, k, m, n, \y, and Az, then 


(i) |\C.| >Owt<b-», 

(ii) |C,| = Oift > b — v, and 

(iii) P(r — AY"? — vr)" | C; | is a perfect integral square, if 
t=b-—v. 


5. Inequalities on s;,, for regular symmetrical designs. Let ¢ = 1. Then since 
the factor outside of | C; | in (4.5) is positive, it follows from Theorem 4.2 that 
| C, | = 0. Hence, from (4.6), 


(5.1) r’(d1 — Aa) |= (sii) — 7° + de — mrs = 0. 
w=l1 

Since r’(A, — As) ¥ 0, 

(5.2) > (s7:)? = F — whe + mo. 


w=) 


Now let ¢ = 2. Since cy = cx. = 0, it is necessary by Theorem 4.2 that cy. = 
Cx = 0. Hence from (4.6), 


5S = h- a at — Xo 
(5.3) Si2 As + (r eo Do) x dz) ’ 


where 


m 
e= >. sh sts — Mro. 


w=1 


From (5.2) and the observation that s; = 0 (j = 1,2;w = 1, --- , m), it follows 
that 
(5.4) —m. Sesr—t. 


From (5.3) and (5.4) we obtain 
THEOREM 5.1. For a regular symmetrical GD design the number of treatments 
$j. common to two blocks satisfies the inequalities 
Ao(r — dy) /(7® — ore) S Bu SM, 


when i, > Az. The inequalities are reversed when \y < ro. 
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6. The block structure for regular symmetrical GD designs when r? — v2, and 
a1 — 2 are relatively prime. We need to consider the distribution of the treat- 
ments contained in an initial block B, among the other blocks. Let n; be the 
number of blocks among the remaining (b — 1) blocks which have j treatments 
in common with B, . Then from the definition of the design we obtain 

k 
ni =b-—-1l=v-1, 
j=0 


(6.1) ; 
> jn; = r(k — 1) = r(r — 1). 


j=0 
Also consider M = 5-5. j(j — 1)n;, which is twice the number of pairs of 
treatments of B, which lie among the other blocks. M is given by 


(6.2) M= Zz sti(sth _ L(y -1)+ 2 Si sti(Ae — 1). 
w=] z.w=1 
zw 


From (3.7) and (5.2), since r = k, 


(6.3) > sn(sn — 1) = (n— 1), 
w=1 
(6.4) > sist: = (m — 1)ndr2. 
potty 
Hence 
(6.5) M = (n — 1)A) Qi — 1) + (m — 1)(n) (A) Ag — 1). 


Now consider 


k 
(6.6) B= > (j — Wj — ddNn;. 
j=0 


From (6.1), (6.5), and (6.6) we obtain 
(6.7) B= 0. 


Hence the following lemma. 
Lemma 6.1. If for a regular symmetrical GD design n; denotes the number of 
blocks which have j treatments in common with a given initial block, then 


k 
B= Donji — aj — &) = 0. 
j=0 


Now let r° — vd. and \y — Ae be relatively prime. It follows from (5.3) that 
S12 cannot lie in the open interval (A, , Az). Then every term of B is positive or 
zero. But since B = 0, every term must be zero. We thus get 

TuHeoreM 6.1. Jf for a regular symmetrical GD design r° — vd: and \y — de 
are relatively prime, then any two blocks have either \, or d2 treatments in common. 
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We further observe that even if r> — vd, and A; — A: are not relatively prime, 
it still may not be possible to choose the elements of G; of (3.8), subject to the 
restrictions of (3.7) and (5.2), such that s;, is integral, but is not A, or A. 
Consider, for example, the GD design with parameters v = b = 45,r = k = 9, 
m = 3,n = 15, = 3, and dy = 1. The highest common factor of r*> — vd, and 
A. — Az is 2. It is clear that the only positive integers which satisfy (3.7) and 
(5.2) are 1, 1, and 7. But then we must have either 07., s¥,s%. = 51 or 15, 
which correspond respectively to A; and A: . 

Now assume that the condition of Theorem 6.1 is met, or more generally, 
that positive integers do not exist which meet the restrictions of (3.7), (5.2) 
and Lemma 6.1 and imply values of s;, other than \; and \,. Then from (6.1) 
we obtain 

mM, +m, =v-—1, 


Am, + Ama, =r(r— 1), 


(6.8) 


whence 
mM, =n—!1 
(6.9) ' , 
rh, = (m oT 1)n, 


so that with respect to any initial block B,, there are (n — 1) other blocks 
which have \, treatments in common with it, and (m — 1)n other blocks which 
have , treatments in common with it. 

From (5.3) we see that 


(6.10) Dd sis?) = r+ (n— DA 


w=1 


implies that blocks 1 and j have \, treatments in common, and conversely. But 
then from (5.2) and (6.10), it follows that 


(6.11) ' a si 83; = 2 (st1)’, 
which implies that st, = sj;, (w = 1, --- ,m;j = 2,---, 6). Hence, if blocks 
B, and B; have ), treatments in common, and blocks B, and B, have \, treat- 
ments in common, then B; and B, have \, treatments in common. We thus 
have 

Tueorem 6.2. If for a regular symmetrical GD design r* — vdz and \y — dz are 
relatively prime, then the blocks fall into m groups of n blocks each, which are such 
that any two blocks from the same group contain i, treatments in common and any 
two blocks from different groups contain d, treatments in common. 

As has been indicated above, this theorem could be stated somewhat more 
generally. 


7. The semi-regular class. For this class rk — vA: = 0, and hence the above 
theory does not apply. We shall give a simple example which demonstrates 
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for small v that there do sometimes exist solutions in which sj, # d, or Az for 
some j and u. 

Consider the GD design with parameters v = b = 8,r = k = 4,m = 4, 
n = 2,r, = O, and dy = 2. One solution is 


Bo i ee a ae 
DOO Oy fee 
' i 8.8.2 8.43 
yO) a..8. 2-8. 3. 2 OS 
s C..5 2:8. ae A Se 
LO 2.84, 8.3 
e323 8S fF & 8 
ok eS 2 ee 


which has the property that the blocks break up into 4 groups of 2 blocks each, 

which are such that two blocks in the same group have zero treatments in 

common and any two blocks from different groups have 2 treatments in common. 
Another solution is 


Ts so eee Ss 
= & Se ae my 
Le Oi. bom ee oF 
vy a eo ee ee Oe 
: ee ¢ 8 Tf 8:3 
re: i: ¢§ ie 2 
FAW. t.49 9 | 
2 29 = to 9 


which is such that any initial block has 1 treatment in common with each of 
three blocks, 2 treatments in common with each of three blocks, and 3 treat- 
ments in common with one block. 

We shall now obtain inequalities for the number of treatments s;, in common 
to any two blocks of a symmetrical semiregular GD design. Since for a semi- 
regular GD design, rk = vd2, it follows that r — A, = n(A: — A,), from which 
we obtain the following lemma. 

Lemma 7.1. For a semi-regular GD design, it is necessary that x >». . 

Now let r = k. Choose any two blocks and let the columns of N be permuted 
so that the first two columns correspond to the chosen blocks. Then to N affix 
m new columns, the wth of which contains (A; — ,)' in the rows which cor- 
respond to the treatments of the wth group, (w = 1, --- , m), and zero else- 
where. Let the augmented matrix be denoted by N,. Now form 


Ne 
(7.1) N;3 = ; 
I, 0 
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where J; is the identity matrix of order 2 and 0 is the 2x(b + m — 2) matrix all 
of whose elements are zero. Then 


(7.2) | NsNs | = (r + ode — a) "(7 — Ws)” | Be, 

where B, is a 2 X 2 matrix with elements 

(7.3) = be = (r + td: — ¥x)(—ds) + Avr’, 
bis = by == (r + 0X2 — Au)(—Si2) + Ay”. 


As for Theorem 4.2 it is necessary that | N,N; | = 0, and since the factor out- 
side of | By | in (7.2) is positive, it is necessary that | B, | = 0. Hence, the follow- 
ing theorem: 

THEOREM 7.1. For a symmetrical semi-regular GD design, the number of 
treatments common to two blocks, 8, , satisfies the inequalities 


»r < Su 2ror 


© aaa ete le 


I wish to express my thanks to Professor R. C. Bose for suggesting this 
problem. 
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AN OPTIMUM SOLUTION TO THE k-SAMPLE SLIPPAGE PROBLEM 
FOR THE NORMAL DISTRIBUTION! 


By Epwarp PavuLson 
University of Washington 


0. Summary. A slippage problem for normal distributions is formulated as a 
multiple decision problem, and a solution is obtained which has certain optimum 
properties. The discussion is confined to the fixed sample case with the same 
number of observations from each distribution, and the normal distributions 
involved are assumed to have a common but unknown variance. 


1. Introduction. This paper will consider the problem of how to compare k 
categories, such as k varieties of wheat, k machines, k teaching methods, etc., so 
as to decide on the basis of a random sample of n observations with each cate- 
gory whether or not the categories are equal, and if not which is the ‘best’ one. 
A problem of this type has been discussed by Mosteller [1] for the nonparametric 
case. In previous papers [2], [3], we had considered some different types of 
multiple-decision problems arising in the comparison of k categories, and the 
emphasis had been on studying the distribution problems involved when the 
statistical procedures used were suggested by intuitive considerations. In this 
paper we will be primarily concerned with finding a statistical procedure which in 
some reasonable sense is an ‘optimum’ one. 

In this paper we will restrict our attention to the case where the n observa- 
tions 21, T2,°** , Lin in the 7th category II; are assumed to be normally and 
independently distributed with mean m; and a common standard deviation 
a, and the best category is (for convenience) defined to be the one associated 
with the greatest mean value. Let Dy) denote the decision that the k means are 
all equal, and let D; (j = 1, 2, --- , &) denote the decision that Dy is incorrect 
and m; = max (m,, m2, --~ , mx). Our problem is to find a statistical procedure 
for choosing one of the k + 1 decisions (Dp , D; , --- , Dx) which will be in some 
sense an optimum one. At this point, instead of introducing a weight function 
as required in the general theory as developed by Wald [4], we will follow a 
simpler plan which is somewhat analogous to the classical Neyman-Pearson 
theory of testing a hypothesis, and attempt to find a statistical procedure which, 
subject to certain restrictions, will in certain instances maximize the probability 
of making the correct decision. 

In order to give a more precise formulation and the solution to the problem 
let x;. denote the ath observation in the sample from I]; (i = 1, 2,---, k; 
a=1,2,---,n), let; = ois (t:2/n), 2 = > tnt (#,/k), ° = _- ont 
(tia — %,)°/({k(n — 1)], and let M be the subscript of the category with the 
greatest sample mean, so that Zy = max {#,, Z,--- , %}. We will say that the 
category II, has slipped to the right by an amount A (A > 0) if m = m = 


1 Work done under the sponsorship of the Office of Naval Research. 
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/ = Mi = May = +++ = mand m = m, + A. The first formulation of the 
problem is the following: to find a statistical procedure for selecting one of the 
decisions (Dy , D; , --- , Dx) which will maximize the probability of making the 
correct decision when some category has slipped to the right subject to the re- 
striction (a) when all the means are equal, Dy should be selected with probability 
1 — a (where a is some small positive number fixed in advance of the experi- 
ment). In this formulation, the class of allowable decision procedures seems to 
be too large to admit of an optimum solution and we will, therefore, limit the 
class of allowable statistical procedures by the following additional restrictions. 
(b) The decision procedure must be invariant if a constant is added to all the 
observations, (c) the decision procedure must be invariant when all the observa- 
tions are multiplied by a positive constant, and (d) the decision procedure must 
be symmetric in the sense that the probability of making the correct decision 
when category II; has slipped to the right by an amount A must be the same for 
i = 1, 2,---, k. These additional restrictions are rather weak and seem to be 
reasonable requirements to impose in many practical problems. The problem is 
now reformulated as follows: to find a statistical procedure for selecting one of 
the set (Do, D,, --- , Dy) which, subject to restrictions (a), (b), (c) and (d), 
will maximize the probability of making the correct decision when one of the 
categories has slipped to the right. The optimum solution will be shown to be the 
following procedure: 


n(Zu — £) 


if > > ig select Dy; 
V A (ie — 


n(iu — Z) 


VX » » (tie — #)° 


t—1 a=1 


S da, select Do. 


Here \, is a constant whose precise value is determined by requirement (a), and 
does not depend on A or oc. Since for a given k and n the value of \, depends 
only on a, the optimum property of (1) holds uniformly in A and ce. 


2. Derivation of the optimum procedure. There is obviously no loss of generality 
in only considering statistical procedures which depend on the set (%, #,---, 
E, , 8) since these constitute a set of sufficient statistics for the unknown param- 
eters (m;, m2, --- , m, o°). Making use of this in connection with restrictions 
(b) and (c) it is easy to see that any allowable decision procedure will depend 
only on the k—1 statistics (4; — %,)/s, (@ — %)/s,--+, (Gea — F%)/s. Letwe = 
(Fa — &)/sand let a, = (mz — m)/o fora = 1, 2,--- , k—1. The joint prob- 
ability distribution of the set (w;, we, +--+ , We.) depends only on the parame- 
ters (a; , d2, --* , @-1). Let Dy denote the decision tha a; = a, = --- = ay, = 
0, and for 1 =. j < k — 1 let D; denote the decision that a, = a, = --- = 

=--- = a, = 0 and a; = A/z, while D, denotes the decision 
that a; = a2 = +--+ = ay; = —A/c. Since any allowable decision procedure for 


Qj = Ajai 
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selecting one of the set (Dy, D,, --- , Dy) must be a function only of (w,, we, 

- , We-1) it can be transformed in a natural manner into a decision procedure 
for selecting one of the decisions (D, , D,, --- , D.) by making D; correspond 
to D,; for i = 0, 1, 2, --- , k; that is, whenever the original decision procedure 
selects D; the transformed decision procedure is to select D; . Because of restric- 
tion (a), the probability that any transformed allowable decision procedure will 
select Dy) when a; = a2 = --+ = ay, = 0 will be equal to 1 — a; in addition 
the probability that any allowable decision procedure will select D; when II; 
has slipped to the right by an amount A is equal to the probability that the 
transformed procedure select D; when D, is the correct decision, and this last 
probability must be the same for each 7 because of restriction (d). 

The proof that (1) is the optimum solution consists mainly in showing that 
for any A and o there exist a set of nonzero a priori probabilities go , 9: , ---* , Je 
which are functions of A and o so that when (1) is transformed in the manner 
indicated above into a decision procedure for selecting one of (Dy , Di , --- , Dx), 
it will maximize the probability of making the correct decision among the set 
(Do , Dy, --- , Dx) when g; is the a priori probability that D; is the correct de- 
cision. Assuming this has been demonstrated, it follows easily that (1) must be 
the optimum solution. For suppose there existed an allowable decision procedure 
D*, which for some A and a had a greater probability than (1) of making the cor- 
rect decision when some category had slipped to the right by an amount A. 
Then D*, which must be a function only of (w;, we, --- , We-1) When trans- 
formed in the indicated manner into a decision procedure for selecting one of 
(D., D., --- , Dx) will have a greater probability than (1) of making the cor- 
rect decision among (D,, D,, --- , D,) with respect to any set of non-zero a 
priori probabilities, which would be a contradiction. 

To show that the required a priori distribution exists, first let vu. = (£2 — E)/o 
(a = 1,2,--- ,k — 1) sothat w. = (u.o/s). The random variables (uw , Ue, --- , 
Ux-1) can easily be verified to have a (k — 1) dimensional multivariate normal 
distribution with common variance = 2/n, common correlation = 4, and mean 


values (a; , @2, +++, @-1). By an elementary calculation, the joint probability 
density function of w , ue, «++ , Ue isgiven by C; exp [—3{A > 2ci(e —a,)?+ 
BY a+s (Ua — Ga) (ug — ag)}] where A = ((k — 1)n/k), B = —n/k, and C, 


is a constant whose precise value is nct needed. Using this result plus the known 


facts that n’s’/o” has the x’ distribution with n’ = k(n — 1) degrees of freedom 
and is independent of the set m , w2,--- , Use1, the joint probability density 
function f(w;, we, +++, Wear) Of wi, +++, We-1 is easily found to be given by 


k 


2 ( to} 

f(wi, we, +++, Wea) = c. [ sre exp | «n'y +A Dd (way — aa)” 
0 \ a=l 

(2) 


+B>> (way — aa)(wsy — as) | dy. 
ats j 


Let f; = f(wi, +--+, we+! D,) denote the joint probability density function of 
W,,-**, We, when D; is the correct decision. The decision procedure which 
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will maximize the probability of making the correct decision among the set 
(D,, D,, --- , Dy) when the a priori probability distribution is (pp), pi, p2, 

- , px), that is, the Bayes solution with respect to (po , pi, --* , Pe), is known 
[4] to be given by the rule: for each j (j = 0, 1, --- , k) select D; for all points 
in the w; --- wy, space where p,f; = max {pofo, pifi, --* , Pefe}. For the prob- 
lem at hand, this is the unique Bayes solution except possibly for a set of measure 
zero according to all f;. Using (2) it is easy to calculate for each j the region 
where D; is selected for the special a priori distribution p = (1 — kp), m = 
P2-*: = py = p. For example the region where D, is selected is given by the 
points in the w space where f; > fo, fi > fs, --+ ,f: > fe, and pfi > (1 — kp)fo. 
For any j with 1 < j < k, the region where f, > f; is given by 


4 at} bi . 
| y”'** exp | -3 (w y+ Ay> wit+ BY > waws +A = 
0 a=l a+é o* 
k—1 
—2B . y > ws) 
o a=) 


( \ 
-< exp E . yu, — B . ro | = exp| . yu; — B : ve | 5 dy > 0. 
\ Ci o o o } 


The integrand is positive for all y in the range 0 < y < © if w, > w;, and the 
integrand is negative for all y in this range when w, < w;, (since A — B > 0) 
so that f,; > f; for 1 <j < k if and only if w, > w;. In a similar manner, it is 
easy to show that f, > f/f, if and only if w, > 0. The region where pf, > (1 — kp)fo 
is given by 


: 2 k—1 
f y" ** exp | - ; (w +A LD wit BL ws) 
> 2 a=1 ats 


2 4 a= -\ | 
4 pexp (— 4%) exp] (4 — B) Sw +BSE Wa> | —(1—kp)>dy>0. 


a= 
Making a change of variable, this region is equivalent to 
o 2\ / 2 
0 z/\ 20° c 
. 
— (1 — kp)? dt > 0, 
where 


k— 


1 
(A — Bui+ BD w, 
a=l 


SS eee 
VEST > w+ BD wavs 
a+s 


a=l 


h(wi, W2, +++, Win) = 


The integrand on the left hand side is for all ¢ a monotonically increasing function 
of h(w:, --* , We-1), So the region where pf; > (1 — kp)fo must be of the type 
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h(w,, ++ , Wes) > L where L is a number which depends on A/o and p. The 
other regions can be calculated explicitly in a similar manner, and the Bayes 
solution is the following procedure: for 1 S$ 7 S k — 1 select D; if w; > 0 and 


wy; > max (w, "ty Wi-ry Witt, ***, We~1) and 
k—1 Pe ee Se ee ee 
(A — Buy +BY we> L/w +A » A +BX Wa Ws; 

a= a=] 


select D, if w; < Oforj = 1,2,---,k — land 
k—1 
[-A — Bk —2)] Dawe > LV n' +A Dd wi + BD wes; 
a=] 


otherwise select Dy, . Define the function F(p) by the equation 


F(p) = c pr th? en(—§){p exp (-44) exp (2 ret) -(1l- kp) at, 


where \,. is the constant used in (1). It is obvious that F(p) is a continuous func- 
tion of p with F(0) < 0 and F(1/k) > 0. Hence there exists a value p* with 
0 < p* < 1/k which is a function of A/c so that F(p*) = 0. Once the Bayes 
solution relative to (1 — kp, p, p, --- , p) has been worked out, it is obvious 
that to get the Bayes solution relative to (1 — kp*, p*, --- , p*) it is only neces- 
sary to replace L by \. . If we now substitute w; = (1; — 2,%)/s and replace A 
and B by their values, we find after some algebraic simplifications that the 
Bayes solution relative to (1 — kp*, p*, --- , p*) reduces to (1) when D; is made 
to correspond to D; . Since (1) is an allowable procedure, this proves that it is an 
optimum one. 


3. The calculation of \,. The calculation of the exact value of A. required in 
order to have P{n(y — Z#) > uv >, Donn (ti5 — #)°} = a when all k 
means are equal will be extremely difficult until tables are made available, and 
therefore some approximation is required at present. For this purpose let A; 
denote the event [n(%; — #) > MeV oes 7 oot (ti; — #)'], so that 


Pinlfu _ z) > Nea V 7 a amt (xi; _ £)*} 


will be equal to the probability of the occurrence of at least one A; (¢ = 1,2, ---, 
k). The approximation to be suggested is of a familiar type, and consists in 
determining A, so that P(A) = a/k. For this purpose, it is clearly legitimate to 
take m, = m,--- = m = Oando = 1. Next, let y; = ~/nd;(j = 1,2,--- ,k) 
so that {y;} constitute a set of independent and standardized normal variables, 
and let 9 = (doi-1 y:/k).. Then 


k n 


P(A)) = Piu-g>% > » (25 — %)° + ¥ vy - pe. 


tm] jel tax] 
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Now we introduce an orthogonal transformation given by 


k 


h= SS, 
1 Vk 


k 
Du — (k—-r+ Vy 


tr 


Vert er t2) 


The new variables ¢,,--- , & also constitute an independent set of normally 
distributed random variables with zero means and unit variances and are ob- 
viously independent of >-‘.1 >}: (aij — %,)*. We now have 


P(A) P{4/E LY > (xis — 2) + ot ti :\ 


p{e— : Des Ste ~ 2° + > «)} 


t—1 j=l 


yy 2 
yp {(E= si re é> eh, 


r= 2,3,---,k. 


where n” = k(n — 1) + k — 2 and Xne = Dia > jt (1; — #)° + Dias ti 
has the chi-square distribution with n’” degrees of freedom and is independent 


of t . If Fo is used for the value of the F distribution with n, = 1 and nm, = n” 
degrees of freedom which is exceeded with probability 2a/k, it is a simple matter 
to verify that the desired approximation is given by 


aia "A nk — 1)Fo 
. k(n” + Fo)” 

If \. is determined by the above formula so that P(A,) = a/k, it follows at 
once from Bonferoni’s inequality [5] that the probability of not selecting Do 
when all the means are equal will be less than a by an amount which cannot 
exceed 4k(k — 1)P(A;A;2). This quantity is still difficult to evaluate, but in the 
limit as n — ©, 4k(k — 1)P(A,A:2) can be obtained from tables of the normal 
bivariate distribution, and is easily shown to be less than }a’ for n large enough. 
Even for small n it seems plausible on an intuitive basis that this bound will be 
small for values of a ordinarily of interest (say a S .05), although further inves- 
tigation on this point would obviously be desirable. In any event, if the approxi- 
mation \. = Wn(k — 1)Fo/[k(n” + F,)] is used, it can be asserted thut for any 
n the probability of not .electing Dy when all the means are equal is less than 


a, and for large n the difference between the true probability and a will be less 
than 4a’. 
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LIMIT THEOREMS ASSOCIATED WITH VARIANTS OF THE 
VON MISES STATISTIC’ 


By M. RosENBLATT 
University of Chicago 

1. Summary. A multidimensional analogue of the von Mises statistic is con- 
sidered for the case of sampling from a multidimensional uniform distribution. 
The limiting distribution of the statistic is shown to be that of a weighted sum of 
independent chi-square random variables with one degree of freedom. The 
weights are the eigenvalues of a positive definite symmetric function. 

A modified statistic of the von Mises type useful in setting up a two sample 
test is shown to have the same limiting distribution under the null hypothesis 
(both samples come from the same population with a continuous distribution 
function) as that of the one-dimensional von Mises statistic. We call the statis- 
tics mentioned above von Mises statistics because they are modifications of the 
w? criterion considered by von Mises [5]. 

The paper makes use of elements of the theory of stochastic processes. 

2. Introduction. Let X, = (Xi, --- , Xi), 7 = 1, --- , n, be a sample from 
a k-dimensional uniform distribution; that is, 7,;,7 = 1,---,n,j = 1,---,k, 
are independent and uniformly distributed on [0, 1]. Let 

life st, 
(1) g(x) = 4 
Oifx >t. 


The sample distribution function is 


n . 
(2) S,(0) = Silt es oe te) ne : 2 Il $: ;(Xi5), 


t=] ja} 


where - (4, --: , &). Consider the process 

(3) Y,@) = Vn(Silh, +++, &) — th &), 

Clearly EY,(t) 0. The covariance of the process is 
E(Y,(OY.(0)) = r,.(1, 7) 


1 “ . 7 2 ; ) oe. fw : ' 

= E [><11 $:;(Xij) — tr --- te? D4 TT 64;(X.,) — t--- 

1 n / ( k 7 } {(k ; ; ) 
DE ({11#.,(x.) as ty eee t, >< TT o:3(X,,) —_ ty peo th 

N inl \j=l ) \i=l } 
, 

II min (¢;, t)) — t+ ++ tty +++ ths 


j=l 


1 Work done under contract with the Office of Naval Research. 
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Note that the covariance function r,(t, #’) is independent of n and symmetric in 
t and ?. 


Consider the function r,(é, t’) = r(é, #’) as the kernel in the following eigen- 
value problem 


[ ra Dee) ar = 00, 


where the integral is over all components of ?’. The kernel is positive definite 
(being a covariance function) and hence all its eigenvalues are positive. There 
are a denumerable number of eigenvalues. Denote the eigenvalues by Ay, Az, - 
and the corresponding orthonormal eigenfunctions by 


a(t), a(t), 
It is understood that each eigenvalue is repeated with the multiplicity of the 
linearly independent eigenfunctions corresponding to it. Now 


(5) rit) = Drea 


with uniform convergence according to Mercer’s theorem. The general theorem 
of Karhunen on representation of stochastic processes [3] then implies that 


(6) Y.(t) = DS Vd; (0) Yn 
j=l 
in the mean square, where 
EY,,; = 0, EY a; nz = 6 5x . 


3. The limiting distribution. As n — ©, the joint distribution of Y,(h), «-*, 
Y,(tm) approaches the joint distribution of Y(t), ---, Y(im), where Y(é) is a 
normal process with mean zero and covariance r(t, t’). Obviously the process 


where the Y; are independent normal random variables with mean zero and 
variance one. 

THEOREM 1. The von Mises statistic corresponding to a sample of n from a k- 
dimensional uniform distribution is 


1 2 
(8) [ noat= Drs, 
0 je=l 
and the limiting distribution of (8) as n —» © is that of 
1 20 
(9) [ r@a= U7. 
0 j=l 





VARIANTS OF VON MISES STATISTIC 
Proor. Now 
YO = Lal, 
where the random variables 
Z(t) = I $:(Xij) — thes te 
are independent and identically distributed. Then 
- 1 


Yj ~ y Jn Zij 


k=l 


’, 1 : = 7 7 
Ly = al Zu(0)6,(1) al. 


The random vectors 
(Zin, +++ » Zen), 
are independent and identically distributed. Moreover 
EZ; = 0, 
EZijZu = oj. 


The multidimensional central limit theorem then implies that the random 
variables Y,;,7 = 1, --- , N, are asymptotically normal, independent random 
variables with mean zero and variance one as n> ©. Y%,,---, Yawasn— © 


are asymptotically independent chi-square random variables with one degree 
of freedom and mean one. Note that 


1 « 
(10) [ Gdd= Dx. 
0 j=l 


Given any ¢ > 0, let N(e) be such that 


Vy 
Nie)+1 
Uy asymptotically has the same distribution as 


N(e) 


Zhe? ; 


1 
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that is, for sufficiently large n 
(N(«) 


| ) 
| PiUw S 2} — Pid Yi sa d| <e 
1 ) 





The choice of N(e) and Tchebycheff’s inequality imply 


( pi id no (N(e) _ | 

Ps [ Yds s+ P\d Yi S e}| <¢, 
(40 ) 1 ) 

( 


1 
P{Un Sz} — Pi Yi(t) di Sat c <«¢ 
0 





Hence 


< 3e 





JP{[ vidas z+e}—P{[ yO di sz +} 


1 
for sufficiently large n. The distribution function of [ Y’(i) dé is continuous. 
0 
1 
Therefore the limiting distribution of [ Y’,(t) di as n — @ is the same as that 
0 
1 
of [ Y*(2) di. 
0 


1 
The distribution function of [ Y’(@) di has been computed in the 1-dimensional 
0 


case (k = 1). The eigenvalues of (4) are then A; = 1/(x'j*) j = 1, 2,--+ and 
hence the characteristic function of (9) is aoe {1 — Qit/(x*7*) |. One can invert 
the characteristic function by a contour integration and obtain the distribution 
function of (9) as given by Smirnov [5, 2]. It would be of great interest to find 


the eigenvalues of (4) when k > 1. 


4. The two sample test. Let X1;,7 = 1, --- ,n, and X»,k = 1,---,m, be 
samples of n and m respectively from a population with some continuous dis- 
tribution function F(x). Let S,(¢), S2(¢) be the corresponding sample distribution 
functions. Various people [4] have suggested using 


, — S A } S. ; 
/ (Si(é) soy a (S0 + Si) 
Ut J-w 


(11) = 





as a test statistic for the two sample problem. 

THEOREM 2. Statistic (11) has the same limiting distribution when n — ~, 
m/n—>X > 0 as the one-dimensional von Mises statistic under the assumption 
that both samples come from the same continuous population. 

Coasider computing the statistic for samples F(X,;), 7 = 1,---, n, F(Xx), 
k = 1,--+,m, of n and m respectively from a population with the uniform dis- 
tribution. The value of the statistic is the same as that obtained from the orig- 
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inal samples {X,,;}, {Xx} and consequently has the same distribution function 
as the latter. We need then only consider the statistic 


(12) = [ (Si(t) — Si(t))? (Se SO) 


when the samples are from a uniformly distributed population. It is obvious that 


(13) se mf (Si(t) — Sa(t))? at 


has the same limiting distribution as the one-dimensional von Mises statistic 
when n — «©, m/n—X > 0. It would then be sufficient to show that 


mn ae deni S(t) + S(t) ) 
(14) me | (si - S.0) a (SO FSO - 


converges to zero in probability when n ~ ©, m/n > X > 0. Now 


1 F » \2 Si(t) + S2(t) 
—- l (Si(t) — S2(t)) a (SO 4S - t) 


m Tx af (S,(t) — S,(t))’ d(S.(t) — 2) 


OY 2 mn . oy 
* (S,(0) — t) d(S2(t om” + n Jo d(S,(t) = t) 


can be obtained by a series of intégrations by parts. The proof is complete if 
one can show that both expressions directly above converge to zero in probabil- 
ity. By symmetry it is enough to consider one of the expressions. 

Let 


a(t) = n*(Si(t) — 0), 


15) 
- x,(t) = m'(S,(t) — t). 


Now 
Ex,(t) = 0, 
Ex,(r)z,(t) = min (r, t) — zt, f= 1, 2. 


(16) 


We use the following transformation suggested by Doob [1] 


The processes Z,(t), Z(t) are independent of each other. Moreover, each of them 
is an orthogonal process with 


EZ (t) = 0, 
EZ<t)Z(r) = min (r,t), 


(18) 








SEEN Ae ae eh 


seater 
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A simple computation making use of (2), (15), (17) yields 


= min (¢t, 7) min (t, r)(max (t, 7) — min (é, r)) 
EL OZi(0) “Li + min (¢, 7) (1 + )(1 + 7) 


tr? n— 1 a 
+ th |+ n ae 


(19) 





and in particular 








' sees kaa’ a-i, 
(20) EA) =>, +3 mere fi 


Now 


mn 





mf (S(t) — )? d(Sx(t) — d) = Me [ x(t) dia(t) 


(21) -— ae (t — 1)°2? GH ea( 4) at 

+ =f e- y'2i(; £)az,(-4) 
(22) = i. I : 0 Zi(t)Zo(t) at 
(23) ~ ee I ’ a5 rap 20 dZ,(1). 


The random yariables (22), (23) are the limits almost everywhere of 


; ? 
* m | ; 
(24) STak Fa Dt Zi(t)Z2(t) dt 
and 
3 
(25) > 


e 
m+n l (¢ + 1) : 1)3 Zi(t) dZa(t ), 


respectively, as J —» «©. The independence of the orthogonal processes Z,(¢), 
Z.(t) implies that the second moments of (24), (25) are 


m rr min (7, t)E(Z}(t)Z}(r)) 
0 


(m + n)? to (1 + é)*(1 + 7)4 shad 


and 


———; | | _ Az4(t)) dt, 
(m+n)? Jo (1+ 0)° 


respectively. 
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Making use of (19), (20) one can see that (24), (25) converge in mean square 
as T — ~ to (22), (23) respectively. But then the second moments of (22), (23) 
exist and are given by 


2 p.@ e 2 2 
(26) m i min (r, t)E(Zi(t)Zi(r)) dt dr 
(m + n)? Jo Jo 


(1 + ¢)*(1 + 1) 
and 

- m [ 1 ‘ip 
(27) “ea ny? ato =H E(Zy(t)) dt, 


respectively. The second moments (26), (27) converge to zero asn — ©, m/n— 
» > O and hence the random variables (22), (23) converge to zero in probability 
as n— ©, m/n —X > O. This in turn implies that (21) converges to zero in 
probability. The same argument implies that 


mn 


m+n 


[ ‘(elt) — Ndi) — 0) 


converges to zero in probability. Hence (14) converges to zero in probability as 
n— ©,m/n—X > 0 and the proof is complete. 
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NOTES 


A MARKOV CHAIN DERIVATION OF DISCRETE DISTRIBUTIONS 
By F. G. Foster 
Magdalen College, Oxford 


Let an irreducible, aperiodic Markov chain’ have the matrix of transition 
probabilities, A = [p;;] (¢, 7 = 0, 1, 2, ---). Then as usual we shall have 


IV 


pii 2 O for all 7 and J, 


; a pi = 1 for all 7. 
j=0 


It is known ((1], p. 325) that the nth power of A, A”, tends to a limiting matrix 


assan-—> @ 


lim A" = B, 


n-—-2 


and B will either be null or have the identical rows, 


Ps x= (M,%,°°*), 
such that 2; > 0 for all i and 5°75 2; = 1. Moreover we shall have 
xA =x. 


In this way we may make correspond to any matrix A, of the type under con- 
sideration, either the null vector or a probability distribution represented by x. 
Conversely, to any distribution x there will correspond a matrix A (not neces- 
sarily unique). A method of constructing such a matrix is given below and illus- 
trated with some examples. 


Let {a;} (¢ = 0, 1, 2, ---) be a sequence of positive numbers and define A, = 
> a; (n = 0, 1, 2, ---). Now let 


ay 
-_ - & © Be as 
| Ay Ay ! 
ar ay de a3 | 
| 
| 


— —- = = © 


A 3 As Az A3 





Then A satisfies the usual conditions for being a transition probability matrix; 


‘For definitions of all terms used see {1}. 
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moreover it is clearly irreducible, and, since its diagonal elements are all positive, 
it is also aperiodic. Now suppose x is a vector such that 


xA = x. 


If we regard this as an equation in x, we find that in our special case the solution 
is easily obtained. We have 


It follows by induction that 


ay do -** Ay 
Ln = 1) ————__ n21 
‘ Ao Ai +++ Ant’ : 
and sox is uniquely determined, apart from a common factor. For x to be a dis- 
tribution, we must have in addition 


xe 


i=> ty = ty + ty Dy 


n=O n=l , A n—l 


% = i/(1 + ~~ ss (Se), 


and it follows that the matrix B (= lim A”) is non-null if and only if 


= GQ, °°° An 


2X Ao aes As-1 . — 


and each row of B will consist of the distribution x. 
Conversely, if we have given a distribution x, we may calculate the sequence 
{a;} which possesses the required property. We have only to put 


an Zn 
= 


ate ’ 
A n—1l Tn-1 


Then we find as required that 


a, ee a 
Lo —————— = &@n. 
Ap +++ Ant 


The sequence {a,} is now easily calculated. We have 
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As = Ana(1 = ), 
Tn-1 


A= w IE (1 +2), 


i=l Ti-1 


Therefore 





and, by iteration, 


Hence, putting (as we may, since a common factor is unimportant) a = 1, we 


have 
n—1 
x x 
a, =— 1+ =) n= 1. 
Tn-i gt ( Ti-1 : 
The above procedure may be given the following interpretation. Consider a 
particle performing a random walk on the integers 0, 1, 2, --- in such a manner 
that when in position n it has probabilities in the ratios 


Cg: 0,202: *** 2On41 


of jumping at the next move into one of the positions 0, 1, 2, --- ,n + 1. That 
is, the particle can move either one step along or back to any previous position. 
The distribution {z,;} may then be interpreted as giving the asymptotic prob- 
abilities for its position after a large number of moves, and we have shown how 
the sequence {a;} may be calculated to give any required asymptotic distribution 
{a,} (with 2; > O for all 2). 

In some cases where x is a recognised distribution the sequence {a;} has a par- 
ticularly simple form. 

Example (a). The Poisson distribution. Let us take 


ZL, = er"/n!, a> 8, n=0,1,2---. 
We find that 
a= SdA+ 1) + Atn—-D), n=1,2,---, 


with a = 1. Thus the nth row of A is a truncated negative binomial dis- 
tribution having n + 1 terms. In particular, when \ = 1, A takes the very 
simple form wherein a, = 1 for all n, and the rule governing the motion of the 
particle is that when it is in the nth position it has equal probabilities of jumping 
into any of the positions 0, 1, 2, --- , m + 1. We have then the result that its 
asymptotic position is a random variable with the Poisson distribution {1/(en!)} 
(n = 0, 1, 2,---). 

Example (b). The negative binomial distribution. Let us take 


z, = (1 — 8) aa +1) -O+e—iv’, a= 1%-- 


a = (1 — 8)’, 
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where 0 < 8 < 1,A > 0. We find that 


Xr —_ 
a, = *F™— 1 a +8) --- (@—- 1+ +n— 2), 


with a = 1. In particular, when A = 1, we have 
a, = (1 + 8)”, n=1,2,---, 
with a) = 1, and each row of A is a truncated modified geometric distribution. 
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ON MINIMUM VARIANCE ESTIMATORS! 
By J. Kierer 
Cornell University 


Chapman and Robbins [1] have given a simple improvement on the Cramér- 
Rao inequality without postulating the regularity assumptions under which the 
latter is usually proved. The purpose of this note is to show by examples how a 
similarly derived stronger inequality (see equation (2)) may be used to verify 
that certain estimators are uniform minimum variance unbiased estimators. 
This stronger inequality is that which (under additional restrictions) was shown 
in [2] to be the best possible, but is in a more useful form for applications than 
the form given in [2]. For simplicity we consider only an inequality on the vari- 
ance of unbiased estimators, but inequalities on other moments than the second 
(see [2]), or for biased estimators, may be found similarly. The two examples 
considered here are ones where the regularity conditions of [2] are not satisfied, 
where the method of [1] does not give the best bound, and where the method of 
this note is used to find the best bound and thus to verify that certain estimators 
are uniform minimum variance unbiased. (For the examples considered this also 
follows from completeness of the sufficient statistic; the method used here ap- 
plies, of course, more generally.) 

Let X be a chance variable with density f(x; 6) with respect to some fixed o- 
finite measure yu. (6 € 2, x € LX). We suppose suitable Borel fields to be given and 
f(x; 6) to be measurable in its arguments. @ is a subset of the real line. For each 
6, let Qo = {h!|(@ + h) €Q}. For fixed 6, let A; and A, be any two probability 


measures on 2 such that E;h = [ hdd; (h) exists for i = 1, 2. Then, for any 
Qe 


1 Research sponsored by the Office of Naval Research. 
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t(x) for which Eyt = 6, we have 


(1) i. (t — 0) V/ f(x; 6) 1! Fatt OI - me V f(x; 6) du 
f(x; 6) 


= Exh = Evh. 


Applying Schwarz’s inequality, we have after some obvious manipulations, 


(E,:h — E;h)? | 
(2) E,(t — 0)° = ns if f(z; 6 + h) d[a(h) — rx(n)]> ) ’ 
Lianne il 


where for each 6 the supremum is taken over all \; and d»2 for which A; ¥ A, 
and for which the integrand of the integral over % is defined a.e. (x). 

We remark that the supremum of (2) is easily seen to be unimproved if \, 
and Eh are multiplied by real numbers c; (¢ = 1, 2) with respect to which the 
supremum is also taken. From this fact it is easy to verify that the right side 
of (2) must coincide with the expression given in Theorem 4 of [2] (for s = 2 
there), and which Barankin shows (under the assumption that f(x; @ +- h)/f(zx; 6) 
is defined a.e. (u) and (for our case) belongs to Lz with respect to the measure 


y(A) = [ f(x; 0) du for all h ¢ Q) to be the best possible bound. However, the 


form of equation (2) is more useful for applications, since one can sometimes find 
\, for which the bound is attained but where no discrete \,; (essentially what are 
used in the form of [2]) actually give this bound. 

It will often suffice in applications to let A: give measure one to the point h = 


0. This gives 
(Exh)? an 
2 







(3) E,(t — 6)” = sup | f(z;6+h)d mth | ° 
M1 [ Qe du — 1 
x F(z; 8) 
If we consider only those \; which give measure one to a single h, we obtain 
1 
2 
4 -_ => 9 , 
? ee Ft tee 
NW Seg Fes 8) 


where the infimum is over all h ¥ 0 for which h ¢ © and for which f(z; 6) = 0 
implies f(x; 6+ h) = 0 a.e. (u). The latter is precisely the condition of equation 
(2) of [1], the result of which thus coincides with (4). 

We now give two examples where the right side of (3) suffices to give the best 
bound, where the right side of (4) does not give the best bound, and where the 
previously mentioned restrictions of [2] are not satisfied. In both examples x 
is Lebesgue measure on the real line. 
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EXAMPLE |. We have n observations from a rectangular distribution from 0 
to 6 (Q = {@|@> 0}). It suffices to consider the maximum Y of the observa- 
tions, whose density is ny"*/6" for 0 S y S 80, and 0 elsewhere. For n = 1, the 
denominator of the right side of (4) becomes inf_genco{ —1/[h(@ + h])}, so that 
(4) gives the bound 6°/4. It would be too tedious to carry this calculation out 
for each n, but it can be shown that, as n — ©, (4) asymptotically gives the 
bound .6486°/n®. On the other hand, if we put dA,(h) = [(n + 1)/6] (h/@ + 
1)" dh for —@ < h < O, the term in braces on the right side of (3) becomes 
6°/{n(n + 2)], which is in fact attained as the variance of the unbiased estimator 
[(n + 1)/n]Y. 

ExamPLe 2. We have m observations from the distribution with density 
e = for x = @ and 0 elsewhere (@ is the real line). Here the minimum Z of the 
observations is sufficient and has density me~"“*~”, z = 6. The denominator of 
(4) is infiso (fe — 1]/h’). The infimum is attained for mh = 1.5936, and 
yields .648/m’ as the bound given by (4). On the other hand, putting dA,(h) = 
me” dh for 0 < h < @ and 0 otherwise, the expression in braces of (3) becomes 


1/m*, which is actually attained as the variance of the unbiased estimator 
Z — 1/m. 
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BHATTACHARYYA BOUNDS WITHOUT REGULARITY ASSUMPTIONS 


By D. A. S. Fraser anp IRwIN GUTTMAN 


University of Toronto 


1. Summary. In [1] a method for removing the regularity conditions from the 
Cramér-Rao Inequality was given and applied to the estimation of a single real 
parameter. It was noted there that the method would extend to problems more 
general than estimating a single real parameter. However, the method extends 
also for the estimation of a single real parameter and produces analogues of the 
Bhattacharyya bounds with and without nuisance parameters. 


2. Introduction. Let u(x) be a o-finite measure defined cver an additive class 
@ of subsets of a space 9, and let X be a random variable with density 


F(z; 1, +++ , %) 


with respect to u(x). 6;, --- , 0% are real with (,-;-,&%)=9OceAC R‘. The 
carrier S(@,, --- , 6.) of the distribution is defined by 


S(0.,--- , %) = {x | f(x; O;,°-- ,O%) > O}. 


er ee ae 


eee Sd 


ee a 
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We restrict our consideration of S(6,, -- 
the measure yu. 
The following lemma given in [2] will be needed: 
Lemma. If the real valued random variables T, Sy, +++ 


, %) to the positive sample space of 


» Sr satisfy 


E(S,) = 0, 
E(TS,) = 1, += 1, 
=> 0, 1 = 2» ** ’ rr; 


then or = 1/o5,.sy.-.s, , Where o5,.5,...s, 18 the variance of the residual of the re- 
gression fit of Sz, --: , S,toS,. 

Proor. Since the covariance of T and S, — 5031,S; is 1, then the product 
of the variances is greater than or equal to 1; or = 1/o5,-z1,s,. The sharpest 
inequality is obtained by a regression fit. 


3. Bhattacharyya bounds. Let 7 be an unbiased estimate of the parameter 
6, , and define {S;,...;,} as follows: 








_ 1 
a= f(a; 9, eo =, 0) of A f(x; a”, Oe) 
1 Se; as, ef” .. - a) al f(x; oe. . - 
~ Fla; ad re %”) a” ae ay” ? 
4 1 2 (0) (0) ) 
Se ay A. fOr, --:,% ) 
f(z; 9, -, O ) of2 962) 
Ls 1 © (0) 
3.1) = ee, oe 5 me, oe 6; » 0) 
5 1 | f(z; 01°, «++ , O°) 
f(a; “i. vee, A” ) oe — am pn ey ) 
4g S051, 62", +++, ORY), fa OY", 83", --- , 04") 
( (1) (0) (1) pion (2) (2) — A (2) 1 ’ 
Oy — 01°) (0; 6; ) (6; 6; )@r — A) 
’ 1 i 1 (0) (0) 
Si... = - A" eve A” 230; ,°°*,O% ), 
—" on. + » Oe) of*?,..,06 oe wh Pa 
where Aj«)...9) g(0”) is the ith divided difference 
7 ), _ nd (i) 
oti pte g@") = B 5; 96"), 
ge) — g(a) 
A 9(6") a — go” 


where the expressions are to be considered as functions of 6” for further dif- 
ferencing. Also we introduce the following assumption concerning the carrier 
of the distribution. 
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Assumption A. S(0{”,--- , 0°) D S(e{'?, --- , 0") for all (i:, +++ , ie) 
for which S’s have been defined. 
On the basis of Assumption A, we have 


Eo,(Siye-via) =| (At ee fae; 4, «+, 08) dul) 
i ke eee k 


‘ 
k 


Eo, (TS;,---i,) = / A” at . Aa; Oo) T du(x) 


o(?)...0f 4) 1) ok he 
= 1, if i } ooo = = 0, 
= Q, otherwise. 
Letting Ss, stand for any one of the above defined S’s except S; and applying the 


lemma of Section 2, the following inequality is obtained (subject to Assumption 
A) for the variance of the unbiased estimate 7’: 


‘ 1 
(3.2) vare, I’ = inf ; 
01 Of? +++ O8y-8g/8g7++-8 g(t) 


If the usual regularity conditions are assumed it is easily seen that this bound is 
at least as large as the ordinary Bhattacharyya bound. 

For a biased estimate 7 having Ee(T) = g(@), the following inequality is 
obtained (subject to Assumption A): 


--oy'?? 


[ fy g(@o) — a Liqessin Pai el pn ) (00) | 
o(2),9(2),... vare, (Si — >» ls Ss) 


(3.3) vare,T = _ inf 


(1) 2(2 
66,05?» 
Ig'ilg’tye** 


4. Multistatistic case. For more than one statistic, say (71, --- , Tm) = T, 
there is an immediate generalization of the inequality (3.3). It is obtained from 
the covariance relation that [}>y, — Dove Dum » x] is positive semi-definite, 
where > yw, >.2z and >>. are respectively the covariance matrices for a vector 
y, for a vector x, and between the vectors x and y. Letting y be the statistic and 
x be a set of the S’s defined by (3.1), then }>., becomes a matrix of differences 
of Ee(T) (as in the numerator on the right hand side of (3.3)). 

5. Binomial distribution. For the unbiased estimation of the parameter p 
of the binomial distribution, the following lower bound for the variance at p 
is obtained using S and an interval h for differencing: 


Pw picntnlliggene 

? & 1 n 

Jatin [-1 
Po Yo 


The greatest lower bound is obtained by letting h — 0. 


BR se eee 
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This greatest lower bound can also be obtained by using the Bhattacharyya 
bound (3.2) without applying a limiting operation: 


Po Ge : 


-_ 2 ’ 

n O38 \—So/2+---—(—1)*Sq/n 
where the S’s are defined (3.1) as the divided differences corresponding to or- 
dinary differences with interval h (hk being chosen sufficiently small that all p’s 
fall between 0 and 1). 


REFERENCES 
{1} D. G. Cuapman anv H. Rossins, ‘‘Minimum variance estimation without regularity 
assumptions,” Annals of Math. Stat., Vol. 22 (1951), pp. 581-586. 
{2} E. L. Leumann, ‘‘Notes on the Theory of Estimation,’’ mimeographed notes. 


a ng 


ON THE ANALYSIS OF SAMPLES FROM & LISTS! 


By Leo A. GoopMAN 
The University of Chicago 

1. Introduction and summary. Suppose we have k lists of names, no name ap- 
pearing more than once in each list. We are interested in estimating the follow- 
ing parameters: (a) the number of names occurring in common in pairs, triples, 

-, of lists; (b) the number of names occurring in 1, 2, ---, & lists. This note 
presents unbiased estimators for these parameters when a random sample is 
drawn from each list. It is also observed that the estimators presented are the 
only real-valued statistics which are unbiased estimators of the parameters, 
and hence must be the minimum variance unbiased estimators. This yields an- 
other example in which “insufficient” statistics have been used to obtain mini- 
mum variance unbiased estimators. 

These unbiased estimators may at times give unreasonable estimates. In such 
cases, it is suggested that the statistics be modified so that the nearest reasonable 
estimate is used. Although this procedure introduces some bias, it usually reduces 
the mean square error. 

This problem arises when we are interested in tracing the interrelations of 
agencies through the individual members. The problem also arises in the work of 
H. H. Fussler and J. M. Dawson of the University Library, University of Chi- 
cago, who are interested in comparing the acquisitions of various libraries. For 
special problems other sampling schemes may be more economical or more 
efficient than taking a sample from each list. Professor F. F. Stephan of Prince- 
ton University pointed out to the author that, in the special case of the “library 
problem,” the Book Catalog and author cards used by many libraries provide a 
convénient means of drawing matched samples. (There is a brief discussion 


1 This work was prepared in connection with research supported by the Office of Naval 
Research. 





SAMPLES FROM & LISTS 633 


of this kind of gampling problem on page 571 of [1].) A sampling scheme based on 
the last digit or two of the serial number of the cards could be used to search 
each library reference file for the same list of books. Special provision must be 
made for accessions made outside the sampling period and for books not covered 
by the Library of Congress cards. The analysis presented herein deals with the 
case in which (either for good, bad, or no reasons) a random sample has been 
drawn from each list. 

The restriction that no name appear more than once in each list may be 
weakened to obtain somewhat more general results. 

The problem discussed in this paper was brought to the author’s attention by 
Professor W. Allen Wallis of the University of Chicago. 


2. Results. 

THEOREM 1. Given k lists of names, let di, names occur in common in lists 1 
and 2, d\3 names occur in common in lists 1 and 3, --- , dix names occur in common 
in lists [t] (where |t] is some subset containing at least two of the integers 1,2, --- , k), 

- , di3..., names occur in all lists. Suppose a random sample of nj = Ni/gi 
names is drawn from list i, which contains N; names, fori = 1, 2, --- ,k. If ey 
names occur in common in the samples from lists 1 and 2, €:3 names occur in common 
in the samples from lists 1 and 3, --+ , ej names occur in common in the samples 
from lists [t}, +++ , €12.... names occur in all samples, then an unbiased estimator 
of dts ts 

die) - II Jil{e) , 


where the product is taken over all values of i appearing in {Ct}. 
The proof is based on the fact that e;. = 5 jt) , Where 


(1, if name j appears in all the samples from lists [t] 


64 = : 
—_ 0, otherwise, 


and the summation is taken over all names. 
THEOREM 2. An unbiased estimator of the number of names occurring in v lists is 


k-—v 

D (-1)'cr** d' + 9), 

t=O 
where d'(v + i) = > dty and the summation is taken over all {t]} containing v + i 
integers. Also, an unbiased estimator of the number of names occurring in at least v 
lists is 


k-v 
> (-1)'crti" dv + 0. 


t=0 


The proof of these results follows from Theorem 1 and some combinatorics. 

TuHeoreM 3. Let F be a real-valued function of the parameters dy: , diz, +--+ , 
diy , +++ , diz...« . Then there can be at most one real-valued function S of the sample 
results €12 , €13, °** 5 [thy *** » @1g..-k , Such that E{S} = F, for all values of the 
parameters. 


Nt ME eee eee te ee ee 
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Proor. Let 2° — k — 1 = M. Suppose we order the M subsets {{¢}}. To sim- 
plify notation we shall designate di, and e;., by d; and e;, respectively, where 
i = i({t]) is the rank of the ordered subset [t]. The sample space consists of a 
subset {le:, €2, °-- , €a|} of M@-dimensional Euclidean space. Let us order this 
subset by increasing values of ey ; for equal values of ey , we order the vectors 
by increasing values of ey-1, --- , for equal values of e. , we order the vectors 
by increasing values of e,. Hence, we may describe the sample space as a se- 
quence O; = [ei(1), e2(1), --- , ew(1)], O2 = [e(2), e2(2), --- , ew(2)], ---, 
where OQ, is the smallest ordered vector, O2 is the next smallest, etc. To each sam- 
ple point O; let correspond the parameter point P; = [di(j), de(j), --- , du(J)), 
where d,(j) = e:(j). Let Pr{O; ; P;} be the probability of obtaining sample point 
O; when P;; is the true parameter point. Then it is easy to see that Pr{O; ; P;} = 
0 for 7 > j and Pr{O; ; P;} > 0. Hence, any unbiased estimate S(O;) of a func- 
tion F(P), defined on the parameter space P, must be such that 


> S(O.) Pr{O; ; P;} = F(P;) 
i=l 
for j = 1, 2, 3, --- . This necessary condition insures the uniqueness of S(O,), 
since S(O;) must satisfy the recursion relation associated with the necessary 
condition. 
In order to calculate the variance of these statistics, we again consider the es- 
timators in terms of 6’s. We then see that the variance of d’[é] is 


oat.) — die} Il Ji, 
’ 


where the product is taken over all values of 7 appearing in [t], which permits 
the calculation of standard errors for the estimators. Similar results may be ob- 
tained for the other estimators presented. 

By Theorem 3 we see that if one wishes to have unbiased estimators, then us- 
ing the results of Theorems 1 and 2 is the best possible move. That is, the statis- 
tics described in those theorems are the only unbiased estimators of the param- 
eters, and hence must be minimum variance unbiased estimators. The reader 
may have observed that e;,; is not a sufficient statistic for d;,, . We see, therefore, 
that minimum variance unbiased estimators have been obtained using statistics 
which are not sufficient. 
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THE STANDARD ERROR OF GINI’S MEAN DIFFERENCE 


By Z. A. LoMnick1 
Polish University College, London 


A general expression for the standard error of Gini’s mean difference g was 
given in a paper under the same title by U. 8. Nair [1]. See also [2], pp. 216-217. 

The object of this note is to deduce in a more direct way a simpler formula 
for the variance of this statistic. The expression obtained is equivalent to that 
given by Nair except for an additional term overlooked in his final formula. The 
simplification is due to the fact that, for the evaluation of the expected values of 
g and g’, it is not necessary to arrange the sample values in ascending order of 
magnitude as done by Nair. 

Let n be the size of the sample, S(x) the probability density function of the 
parent population, u the mean and o° the variance of x in the parent and let 


F(z) = / f(t) dt, = (2) = [ tf(t) dt. 


From the definition 


n n 


9 
a cnmemaaiaiiaties = | 
(1) g n(n — 5 | 75 


(where the values x; are not in order of magnitude but are numbered as they 
appear in the sample), we have 


n n 


9 
E(q) = ——— M(\a;— |) 
(9 n(n — 1) 2 a, E(\z Ti | 


(2) +2 +x 
[ / ja — y | f(a)f(y) dx dy = 


where A is the mean difference (parameter) of the parent population. It is easy 
to check that A can also be written 


+2 « 
(3) A=2 / {aF(x) — Z(x)\f(x) dx = 2/ af(x)(2F(x) —1) dz. 


In order to find E(g°) let us write 
—— {2 (a; — 2)? + 220 | a — 2; | |e — Ze | 


+2>0\ 2; —2;!| 2 — x Fs 
The first sum should be read as the double sum extended to all pairs of different 
subscripts 7, 7, and has n(n — 1)/2 terms; the second as a triple sum extended 
to all combinations of two pairs (7, j), (7, k) of different subscripts 7, 7, k and has 
n(n — 1)(n — 2)/2 terms; the third as a quadruple sum extended to all com- 


) 9 oa — 1)? 
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binations of two pairs (¢, 7), (4, l) of different indices 7, 7, k, 1 and has n(n — 1) 
(n — 2)(n — 3)/8 terms. Thus 


(5) E(g’) = oietaie {2E(x; — 2;)° + 4(n — 2)E(| x; — 2;|| 2; —- Xx|) 
n(n — 1) 


+ (n — 2)(n — 3)E(\x; — 2; || x — 2, |)}. 


The first expected value is equal to 20°; the third to A’. Denoting the second 
by J we have 


>) fata ie 7/2 = 2 on 1 2 _ — ae 2 
(6) var(g) = E(g) A aah (40° + 4(n — 2)J 2(2n 3)A°), 
where 
(7) J= [ [ [ lx—y||x2—2|f(x)f(y)f(z) dx dy dz. 


This can be written as 
j= [ so), [ [ @ — Wie - afte) dy ae 
x \ eo +00 


* f f (x — y)(z — 2) f(y)fl2) dy de 


. ~ : [ (y — x)(x — 2) f(y)f(z) dy dz 
Ai < : (y — x)(z — x)f(y)f(z) dy ash dz. 
Putting 
(9) G(x) = [ (x — y)f(y) dy = xF(x) — Z(z), 
(10) H(z) = | y - 2) dy = G@) +42, 


we obtain 


J= [ ; [G@’(x) + 2G(x)H (x) + H’(x)] f(x) dx 
(11) c : 
= I (G(x) — H(x)F f(x) dx + 4 | G(x)H(x)f(x) de, 


and finally 


1 


5 rs OO 
(12) var (g) ss 


{4(n — 1)o’ + 16(n — 2)7 — 2(2n — 3)A’}, 
































CORRECTION 


where 


id [ ; G(x) H(a)f(x) dx 
(13) obi 


= r {[2F(x) — Z(a)? + (u — x)[aF(x) — Z(x)|} f(x) de. 


This integral can also be written as 


(14) I = ae :: [ (x —y)(z — x)f(x)f(y)f(z) dx dy dz, 


and, according to the distribution involved, formula (13) or (14) may be more 
convenient in the evaluation of var (g). 

Comparing (12) with the formulae given by Nair it is easy to show that an 
additional term (n — 3)u° has been omitted in his final formula for 7, . However, 
the values of var (g) for normal, exponential and rectangular distributions given 
in [1] are correct and agree with those obtained from formula (12) above. 
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CORRECTION TO “A NOTE ON THE POWER OF A NONPARAMETRIC 
TEST” 
By F. J. Massey, Jr. 
University of Oregon 

In the paper mentioned in the title (Annals of Math. Stat., Vol. 21 (1950), 
pp. 440-443) the proof of the biasedness of a test based on the maximum devia- 
tion between sample and population cumulatives is incorrect. A proof is given 
below. Also, on page 442, line 2, “greater” should be replaced by “‘less’”’. The 
notation refers to Fig. 1 of the original article. 

Above point 6b (note F:(b) = F (b)), there will be certain possible heights 
for S,(z) to attain and still remain in the band. Call these heights b; = 1/n, 
be, b3, +++ , be = k/n, where k/n < 2d/V/n. Locate the point x = c¢ (ec < b) 
close enough to z = b so that Fo(c) + d/~/n > by . Then consider 


(i) P, = P{S,(z) remain in band | F(z) = F,(z)}, 
(ii) P, = P{S,(x) remain in band | F(z) = F,(x)}. 


Now P; = >-'-: P{S,(z) passes through b; and remains in band | F,(x)} = 
>\-1 P{S,(z) goes through b, | F;(z)}- P{S,(x) stays in band for x < b| F(z), 
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S,,(x) goes through b,} - P{S,(x) stays in band for x > b | F(x), S,(x) goes through 
b; and is in band for z < b}. However the first and third of the factors is the same 
forj = 0, 1, and thesecond isunityforj = 1, and therefore Py) S P;.If\/-\/N > 


1/N (which is necessary if the test is not always going to reject) then at least for 
height by ; 


?>{S,(x) inside the band for  < b|S,(b) = hk, Fo(x)} < 1. 
Thus the test is biased. 
I would like to thank Professor D. A. Darling for pointing out the error. 
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ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the East Lansing meeting of the Institute, 
September 2-5, 1952) 


1. An Extension of Massey’s Distribution of the Maximum Deviation between 
Two Sample Cumulative Step Functions. (Preliminary Report.) Cura Kver 
Tsao, Wayne University. 


Let 2, < 22.< ++: <Z,andy: < y2< -+- < ym be the ordered observations of two random 
samples from populations having cumulative distribution functions F(z) and G(x) re- 
spectively. Let S,(z) = k/n where k is the number of observations of X which are less than 
or equal to rand S!,(r) = j/m where j is the number of observations of Y which are less than or 
equal to x. The statistics d, = max | S, (2) — S{,(2) | (max over x < z,) and d; = max | S,(x) — 
S’ (2) | (max over xz < max (2, , y-)) can be used to test the hypothesis F(z) = G(z). For ex- 
ample, using d, we would reject the hypothesis if the observed value of d, is significantly 
large. In this paper, the methods of obtaining the distributions of d, and d, (for small size 
samples) are similar to that in Massey’s paper, and several short tables for equal size sam- 
ples are included. (Work supported by the Office of Naval Research.) 


2. Polynomial Correlation Coefficients. W. D. Baren anv J. S. Frame, Michigan 
State College. 


In this paper is developed a formula for the correlation coefficient pertaining to predict- 
ing polynomials. It is shown, when the independent variates are approximately normally 
distributed, that the square of this correlation coefficient can be expressed as a finite sum 
involving the squares of the averages of the derivatives of the estimating polynomial, 
namely, r? = Dye" k!, where y represents the predicting polynomial. The proof is based 
upon manipulations of Bernoulli numbers. 


3. Truncated Poisson Distributions. Paut R. Riper, Wright-Patterson Air 
Force Base and Washington University. 
This paper gives a method for estimating the parameter of truncated Poisson distribu- 


tions for which some of the data are missing, particularly those which are truncated at the 
lower end. Application to a number of actual distributions is discussed. 


4. Frequency Distributions for Functions of Rectangularly Distributed Random 
Variables. Stcart T. Happen, Socony-Vacuum Laboratories, Paulsboro, 
New Jersey. 
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The theory of rectangularly distributed random variables is presented. It is shown how 
such random variables can occur in a certain class of controlled experiments arising in the 
fields of physics, chemistry, and engineering. On the basis of rectangularly distributed 
random variables, arising in observations or process variables, frequency distributions are 
developed for quantities which are functions of such variables. The principal method used 
in deriving the frequency distributions is operationally by means of the Laplace transform. 
Example applications illustrate how such frequency distributions can be applied in the 
analysis of experimental variance. 


5. On Truncated Rules of Action. (Preliminary Report.) Bensamin Epstern, 
Wayne University. 


A rule of action of theoretical and practical interest in life testing can be described as 
follows: (a) Non-replacement. Start the life test with n items drawn from a population. 
Let an integer ro and truncation time 7» be preassigned. By the nature of the experiment 
failures will occur in order. Let X,o., be the time when the roth ordered failure occurs. If 
Xro.n < To , Stop the experiment at X,o,, and take action I. If X,o., > 7 , stop the experi- 
ment at 7’) and take action II. (b) Replacement. Same as non-replacement except that a 
failed item is replaced at once by a new item. The properties of this kind of rule are investi- 
gated in detail when the underlying pdf is of the form (1/6@)e~*/*, z > 0, a distribution of 
some interest in life testing. The distributions of r, the number of items destroyed before 
taking an action, and 7’, the length of the experiment, are obtained. In particular L(@), the 
probability of taking action I (say), E,(r), and E,(T) are obtained. Some tables based on 
this theory are obtained. (Work supported by the Office of Naval Research.) 


6. The Distribution of the Difference of Two Independent Chi-Squares. James 
PacuarEs, University of North Carolina. 


As a special case of the problem of the distributions of quadratic forms being investigated 
by the author, let 7, = X, — Y, , where X, and Y, are independently and identically dis- 
tributed with probability density function (pdf) [['(n/2)]~e-“u(*-* 2, u > 0. If f,(t) denotes 
the pdf of 7, , then the following recurrence equation holds: f,,4(t) = {(m + 1)/(n + 2)} 
fnao(t) + {1/[n(m + 2))}@f,(0, n = 1, 2,--- . The exact distribution of T, is derived. If 
K,(t) is the modified Bessel function of the second kind of order n, then f,(t) = #-4#[I'(n/2)|-! 
(| t/2 | )(-Y 2K a _ye(| t |), mn = 1, 2,--- . Reeurrence relations between the cumulative 
distribution functions (edf’s) of 7, are established so that any edf for odd n depends on 
F(z), while any cdf for even n depends on F;(z), where F,(z) = Pr{| T, | S z]. A method 
is given for evaluating F(z) by a series, with bounds on the error committed by stopping 
with a given term. Upper and lower bounds for F,,(z) are given. (Work sponsored by the 
Office of Naval Research. 


7. Partially Balanced Designs with Two Plots per Block. R. C. Bosg, Uni- 
versity of North Carolina, anp K. R. Narr, University of North Carolina 
and Forest Research Institute, Dehradun, India. 


In many experimental situations, the block size is compulsorily restricted tu two, as in 
comparing treatments given to two halves of a leaf. Partially balanced designs requiring 
only a small number of replications and with m accuracies m S 4 have been worked out. 
It has been noticed that the association schemes of any known partially balanced incom- 
plete block design with block size greater than 2 will lead to a design of the same type with 
block size 2, but a larger number of replications. 


SE ee 
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8. Minimax Sampling and Estimation in Finite Populations. Om PrakasH 
AGGARWAL, Stanford University. 


Stratified and cluster sampling from a finite population is considered from Bayes and 
minimax point of view. The loss in estimating the mean is taken to be the cost of observa- 
tions plus the squared error in the estimate of the mean. For stratified sampling with linear 
cost function, for instance, it is shown that the minimax sampling plan chooses 

: ; ° } 
i a a ee : 
n= V (N2e7/ci) + if individuals at random from the ith stratum and uses the usual 
\ ) 


. e ok 7 => ° . ak , ° ° 
estimate f = ;.,NiX; for estimating D;.,Niui , where k is the number of strata, and in the 
° ° . ~— 2 . 
ith stratum, NV; denotes the total number of individuals, y; , ¢; , the mean and variance, 
c; the cost of sampling per individual, X; the sample mean, and {q} the integer nearest to q. 


9. Some Two Sample Tests on the Exponential Distribution. (Preliminary Re- 
port.) Bensamin Epstein anp Cgra Kvuer Tsao, Wayne University. 


Let S;,, and S2,, be two random samples such that S;,,; is a sample of size n; from a popu- 
lation having pdf (1/6;) exp [—(z — A;/@)] (¢ = 1, 2). Let Sj; be the set of the 
first ri(r; S ni) smallest observations in S;,; . On the basis of S;,, and S:2,, , various likeli- 
hood ratio tests about the parameters involved can be obtained. The likelihood ratio tests 
about the hypothesis 6; = 6: assuming either A; and A, known or unknown are reducible to 
the well-known F-test. The test criterion for the hypothesis A; = Az, assuming 6, and 6, 
known, may be reduced to a random variable having an exponential distribution. The tests 
of the hypothesis that A; = Az assuming 6; and 6. unknown, are also reduced to F-tests. 
Finally the test of the hypothesis A; = Az and 6; = 6: is obtained for the special case r; = rz . 
(Work supported-by the Office of Naval Research.) 


10. Efficiency of Estimators of the Mean of an Exponential Distribution Based 
Only on the rth Smallest Observation in an Ordered Sample. BeNnJaMIN 
Epstein, Wayne University. 


Let us assume that the lives of certain items are describable by a positive random vari- 
able X, whose pdf is f(z; 6) = (1/@)e-*/*, x > 0. A sample of size n is drawn, and we suppose 
that the observations become available in order. Let the experiment be terminated at z,,, , 
the time of failure of the rth item. We raise the question: How much information is lost 
if we base our estimate of the unknown parameter @ only on 2,,,, instead of basing it on all 
the first r failure-times, z;., ,i = 1,2, --- ,r? As reported recently the m. |. estimate based 
on the 2z;., is given by 6,,, = U/r where U = Di Pen + (n—r)z,.,. This estimate is ‘‘best’’ 
in the sense that it is unbiased, minimum variance, efficient, and sufficient. It is shown that 
unbiased estimates of @ based on z,., alone have high efficiencies (= .9) relative to a 
for values of r S$ 2n/3. For example, for r = n/2,n = even integer, the efficiency 22(log 2)? 
= .9608. Tables giving the unbiasing constants 8,,, such that E(8,..2%;.n) = 8, Var(Br.nXr.n), 
and the efficiencies Var(6,.,,)/Var(8;..Xr..) have been obtained for n = 1(1)20(5)30(10) 100 
and r = 1(1)n. (Work supported by the Office of Naval Research.) 


11. On the Theory of Systematic Sampling. II. Wituiam G. Mapow, Uni- 


versity of Illinois. 


It is shown that if the elements of the population are constants and the population is 
monotone then centered systematic sampling is more efficient than random start systematic 
sampling; and that if the elements of the population are random variables and the correlo- 
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gram is monotone decreasing then centered systematic sampling is more efficient than 
random start systematic sampling while if the correlogram is monotone increasing the 
contrary is true. 


12. The Power of Some Service Tests. Leo A. GoopMAN, University of Chicago. 


George W. Brown and Merrill M. Flood have presented in an interesting paper (‘‘Tumbler 
Mortality”, Jour. Am. Stat. Assn., Vol. 42 (1947), pp. 567-574) the results of an analysis of 
a service test that was used to determine which of two types of glass tumblers had a longer 
mean length of life when used in a particular cafeteria. At the end of each week, each 
broken tumbler was recorded and replaced by a new one of the same type. Another kind of 
service test is based on the procedure of replacing the tumblers in equal numbers; i.e., as 
many of type 1 as of type 2, even though they broke in unequal numbers. Still another kind 
of service test is based on the procedure of replacing each broken tumbler by a new one of 
the other type. The preceding two procedures suggested, may be performed using either 
weekly records, or only the final count (the latter is less powerful, but less work). The exact 
power of these service tests is computed under the assumption of constant risk. The asymp- 
totic power is computed in the more general case (non-constant risk). The several service 
tests are compared. This information may be used by the experimenter to decide which one 
of these tests to perform, and when to conclude the test. 


13. A Minimal Essentially Complete Class of Tests of a Simple Hypothesis 


Specifying the Mean of a Unit Rectangular Distribution. ALLAN BrrNBAvuM, 
Columbia University. 


For the problem of testing a simple hypothesis on the mean of a unit rectangular dis- 
tribution, on the basis of n (n 2 2) observations, explicit characterizations of the minimal 
complete class and a minimal essentially complete class of tests are given. Examples of 
tests which are best against various classes of alternatives are given; it is shown that the 
test with highest power against alternatives far from the null hypothesis has minimum 
power against alternatives close to the null hypothesis. 


14. Application of Random Walk Theory to a General Class of Sequential De- 


cision Problems. (Preliminary Report.) G. E. ALBEert, University of Ten- 
nessee. 


One of r decisions d; ,i = 1,2, --- ,r, is to be made concerning the conditional cdf F(y | x) 
of y given z, z and y in a Euclidean space R, by the following sequential experiment. Assign 
r + 1 nonnegative functions p,;(z), i = 0,1, 2,--- ,r, on R with LinoPs(z) = 1. Performa 
random walk beginning at an arbitrary point z», with successive points z; drawn from 
F(2j;4: | 2;),j = 0,1,2,--- , and terminating as soon as one of d; ,i = 1,2, --- , r, has been 
decided under the following rule: let dy, denote the decision to continue experimentation 
after any step z; of the walk; at each step z;, 7 = 0, 1, 2,---, one of the deci- 
sions d; ,i = 0,1, 2, --- , r, is made with respective probabilities p;(z;). Let P;(z) denote 
the probability of making the decision d; ,i = 1,2, --- ,r, asa result of a walk starting at z. 


It is shown that under certain mild restrictions P;(z) = pi(x) + po(z) [Pan dF (y | z). 
R 


Also, the moment generating function and the moments of the duration of the experiment 
satisfy integral equations of a similar type; see Wasow, ‘‘On the duration of random walks.”’ 
Annals of Math. Stat., Vol. 22 (1951), pp. 199-216, for a special case. Some methods of ap- 
proximating the solutions of these integral equations are established. Application of the 
theory is illustrated by a discussion of the sequential probability ratio test of hypotheses 
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6, , #2 on the parameter @ of a general class of cdf G(z; 6) which possesses a sufficient statistic 
for the parameter. 


15. Nonparametric Comparisons of Populations when Data Are Collected in 
Homogeneous Groups. Franx J. Massey, Jr., University of Oregon. 


The method of comparing two populations when data are paired has been fairly widely 
studied; for example, the sign test or t-test on differences. This paper presents similar tech- 
niques for analyzing data which have been collected in groups of size larger than one from 
each of several populations. Comparisons of power curves are made for certain normal 
alternatives (Work sponsored by the Office of Naval Research.) 


16. On the Reduced Moment Problem. Satem H. Kuamis, Statistical Office, 
United Nations. 


Let @(z) and ¥(z), Wye z s b, be any two distinct cumulative distribution functions 
which are continuous and differentiable solutions of the reduced moment problem yu, = 


[ a'da(z),r = 0,1,2, --- ,2n. A proof is given of the inequality (1) | ®(z) — W(x) | S Kp,(z), 
a 


where pn(x) = —| wis; |/D,(xz), i,j = 0,1,2,--- , n, D,(x) is the determinant obtained by 
bordering the determinant | ui; | by the prefixed row (0 1 x z? --- 2") and the correspond- 
ing column, and where 0 < K = AB/(A + B — AB) S Min(A, B) Sl withO< A=1+4 
l.u.b.esz<o(—#'(z)/W'(z)) S land0 < B=1+1.u.b.c<z<0(—W'(z)/#’(x)) S 1. Inequality 
(1) is an improvement of an earlier result by the same author (Proceedings of the Inter- 
national Congress of Mathematicians, 1950, Vol. I, p. 569) which is in turn an improvement 
upon the corresponding Tchebycheff inequality, i.e., without the constant K (Shohat and 
Tamarkin, The Problem of Moments, American Mathematical Society, 1943, p. 72). By a 
special differencing method it is shown that the magnitude of the determinant in the nu- 
merator of p,(z) is independent of the origin of the moments, and that the determinant in 
the denominator is expressible in terms of the moments about the origin z. A method is also 
given for constructing an infinite number of cumulative distribution functions defined over 
a finite interval and possessing equal moments up to any given order, making use of the 
properties of orthogonal polynomials. Inequality (1) is then applied to the special class of 
such cumulative distribution functions associated with the Legendre polynomials. 


17. Canonical Partial Correlations. 8. N. Roy anp J. Wuirriesey, University 
of North Carolina. 


Canonical partial correlations between a set of p and a set of q variates, after elimination 
of a third set of r variates, is obtained by considering the canonical correlations between 
a set of (p + r) and a set of (g + r) variates having r variates in common. Suppose S is a 
(p+q+r)X(p+q+r) p.d. covariance matrix partitioned into submatrices such that the 
first row is Si(:p X p) Six(:p X q) Sis(:p X 1), the second row is S{,(:q X p) Sx(:q X q) 
Sos(:q¢ X r), and the third row is Sis(:r X p) Sas(:r X q) Sss(:r X r). Then the canonical 
partial correlations between the p set and the q set are given by the p nonnegative roots 
(all lying between 0 and 1) of the equation in @: 


| Su — SisSH'Su) — (Su — S13S3'S23) (See - S23S334Sa3) (Siz - SaSaSis) | = 0. 


Putting (i) r = 0, (ii) p = 1, (ili) p = 1,q¢ = 1, and (iv) p = 1, r = 0, we have respectively 
(i) canonical correlations, (ii) multiple partial correlation, (iii) partial correlation, and 
(iv) multiple correlation. 
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18. A Useful Transformation in the Case of Canonical Partial Correlations. 
S. N. Roy, University of North Carolina. 


If the distribution problem in Abstract 31 were to be tackled ab initio, that is, without 
assuming the distributions of canonical correlations, the following transformation would be 
very helpful: Xi(:p X n) = Ui(:p X p) KX (Dyz—@Dys) X (a 2p X n matrix whose first 
row is L,(:p X n) and the second row is In(:p X n)) + Vil:p X r)La(:r XK n). Also 
X2(:q¢ X n) = (aq X q matrix partitioned into 4 submatrices such that the first row is 
Un(:q — p X p)On(q — p X q — p) and the second row is Us;(:p X p)Un(:p X q— p)) X 
(a g X n matrix whose first row is L2(:p X n) and the second row is L3(:¢ — p X n)) + 
Viliq X r)Liir X n), and lastly X3(:r & n) = O3(:r X r)La(:r X n) where the 
(p+q+r) X n matrix X, which is the reduced matrix of observations is supposed to be 
partitioned into X,(:p XK n), X2(:q X n) and X;(:r X n) placed one below the other, D. 
stands for a diagonal matrix with elements (a; , --- , ap), 6 is given by the equation in the 
above abstract, M stands for any triangular matrix with upper right hand corner zero, and 
L'(inX p+p+qd—P+?r) = (LiLsLsly) is subject to LL’ = I(p + q +r). This trans- 
formation for an X of rank p + ¢ + r can be shown to exist and could also be made one to 
one if (i) the 6’s are distinct, and, say, (ii) the first row of U; , the diagonal elements of 
O.. and of 0; are all taken to be positive. This will of course happen almost everywhere 


(in the sample space). Erasing X;, 0;, L4, V: and V2 we have the case of canonical 
correlations. 


19. Uniform Convergence of Distribution Functions. EMaANuEeL Parzen, Uni- 
versity of California, Berkeley. 


We determine conditions under which uniform convergence in a parameter @ of sequences 
of characteristic functions implies uniform convergence in 6 of the corresponding sequences 
of distribution functions, which may be univariate or multivariate. We then derive a uni- 
form central limit theorem and a uniform weak law of large numbers for sequences of inde- 
pendent random variables whose distribution depends on 6. These results may be applied 
to obtain conditions for the uniform consistency and uniform asymptotic normality of 
maximum likelihood estimates to be compared with those given by A. Wald (‘‘Asymptot- 


ically most powerful tests of statistical hypotheses,”’ Annals of Math. Stat., Vol. 12 
(1941), p. 2). 


20. Statistical Aspects of a Linear Programming Problem. D. F. Voraw, Jr., 
Yale University. 


The Hitcheock-Koopmans transportation problem is to determine a most economical 
program of transporting a homogeneous product (e.g., oil) from origins to destinations. 
The amounts of the product at the origins and required at the destinations are given to- 
gether with the cost of transporting a unit amount from any origin to any destination. 
This paper is concerned with the analogous problem arising when the costs are unknown 
parameters in a distribution from which a sample is available. An application of the analy- 
sis of variance is pointed out, and some results of synthetic sampling are presented. (Re- 
search sponsored by the Office of Naval Research.) 


21. Maximum Likelihood Estimators and A Posteriori Distributions. J. Wotro- 
witz, Cornell University. 


Let f(z, 6) be the frequency function at z of each of the independent chance variables 
ZI: ,°** , 2, , Whose distribution depends upon the parameter @. Let g(@’) be the a priori 
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density function of @ at 6’, and let h(0’ | 2, , --- , tn) be the a posteriori density function 
of @ at 6’, given z,,--- ,2z,. Under suitable regularity conditions on f and g, h is asymptoti- 
cally normal, with mean 6 and variance [nc(6)}-!, where 6 is the maximum likelihood estimate 


of @from 2%, --- , Z, and c(@) = [ (8 log f(z, 6))/46)? f(x, 6) dr. Thus the influence of g dis- 


appears in the limit. The present result includes that of v. Mises (Math. Zeit., Vol. 4 (1919)) 
for the binomial case, and those of Kolmogoroff (Izvyestya Akad. Nauk SSSR, Ser. Mat., 
Vol. 6 (1942)) for the normal case. 


22. Estimates and Asymptotic Distributions of Certain Statistics in Information 
Theory. (Preliminary Report.) Joun P. Hoyt, U. S. Naval Academy. 


In ‘‘On information and sufficiency” (S. Kullback and R. A. Leibler, Annals of Math. 
Stat., Vol. 22 (1951), pp. 79-86), the concepts of “information” (designated hereafter as 
“7”) and ‘“‘mean information per observation’ (designated hereafter as ‘‘I’’) for discrimi- 
nation between two hypotheses were defined and various properties of ‘‘J’’ were proved 
for the abstract case. In ‘‘An application of information theory tu multivariate analysis”’ 
(S. Kullback, Annals of Math. Stat., Vol. 23 (1952), pp. 88-102), certain applications of 
information theory were made to multivariate analysis but problems of estimation and dis- 
tribution were not considered. In the present paper, the characteristic function of the dis- 
tribution of “i’’ in a sample of n from a normal multivariate population is found and from 
this is derived the expected value and variance of ‘‘i’’. A sample estimate of n “‘I”’ is 
then considered assuming equality of means in the two populations and a known value of 
one of the variance-covariance matrices occurring in ‘‘J’’. Using unbiased estimates of 
the parameters occurring in the other variance-covariance matrix, the characteristic func- 
tion of the distribution of the estimate is found and is then used to show that the esti- 
mate’s asymptotic distribution is given by the chi-square distribution with k(k + 1)/2 
degrees of freedom. 


23. On Testing One Simple Hypothesis Against Another. Lionet Wess, Uni- 
versity of Virginia. 


Given a sequence (X, , X:,--- ) of independently and identically distributed chance 
variables, Hy is the hypothesis that the probability density function of each chance variable 
is fo(z), H, is the hypothesis that this function is f:(x). A ‘‘generalized sequential proba- 
bility ratio test’’ is defined as the usual Wald sequential probability ratio test, except 
that constant limits are not necessarily used; in other words, after the 7 th observation 
is taken, accept H, if the probability ratio is not greater than B; , accept H, if the ratio is 
not less than A; , otherwise take another observation, where 0 S B; S A; . Given any 
test T of Hy against H, , not using randomization, and such that the probability that 7 
will terminate is 1 when either Hy or H, is true, then under mild restrictions on fo(z) and 
fi(z) the following theorem holds: There exists a sequence (G; , G:,--- ) of generalized 
sequential probability ratio tests such that Pr(sample size, when using G; and H;, is true, 
is no greater than n) = Pr(sample size, when using 7 and H; is true, is no greater than n) 
for all n, all integers 7, and i = 0, 1; and also, as j approaches infinity, lim Pr(H; will be 
accepted when it is true and G; is used) exists and is not less than Pr(H; will be accepted 
when it is true and T is used), fori = 0, 1. If 7 is a truncated test, a stronger theorem 
holds; there exists a generalized sequential probability ratio test, also truncated, enjoying 
the above advantages over T. 


24. Extreme Value Theory for m-Dependent Stationary Sequences of Continuous 
Random Variables. Gror. Watson, University of Melbourne. 
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The distributions, and their limits as N — ~, of the order statistics of N successive 
observations in a sequence of independent continuous random variables with a common 
distribution function, are well known. The present paper considers the same problem for 
sequences governed by stationary m-dependent probability laws. A stationary sequence is 
called m-dependent if P(zo S ko | 1 Ski,++:) = P(aoSko|ziSki,-+-: tm k_m). 
These distributions are found in the general case here and it is shown that, as N — ~, their 
limiting forms are the same as the distributions obtained in the case of independence pro- 
vided max {P(z; > k, 2; > k)/P(z > k)} + O0(k +8, k S 8) and max {P(z; S$ k, z; S k) 
P(z Sk) ~0(k > a, k = a), where the maximum is taken fori,j = 1,--- ,m+1,i + j, 
and where (a, 8) is the range of the random variables z,(t = --- , —1, 0,1, --- ). Either 
or both of a and 8 may be infinite. These latter conditions are shown to be satisfied in all 
stationary normal processes. Thus the results of this paper give the limiting distributions 
of the order statistics-in a sample of successive observations from any norma! stationary 
autoregressive process. 


25. Sequential Tests and Estimates for Comparing Poisson Populations. ALLAN 
Brrnsaum, Columbia University. 


The problem of testing a hypothesis ony = A2/A,; is considered, where A, , A: are the means 
of two Poisson populations. It is shown that no nonsequential test of Ho : y = yo against 
H, :y = y: can have size uniformly S$ a and power uniformly 2 1 — 8(1 — 8 > a); a simple 
sequential test (not of the Wald type) is giver. which has these requirements of size and 
power against one- or two-sided alternatives. The generalization to the problem of classi- 
fying y into one of k intervals is indicated. Comparisons with the Wald sequential tests of 
Ho :y = yo against one-sided alternatives and of Ho :y = 1 against two-sided alternatives 
are made. The latter one-sided tests are constructed by use of a simple sufficient condition 
for the existence of a sequential probability ratio test of a composite hypothesis Ho : 0 € wo , 
against a composite alternative H, : 6 € w; , of size approximately a for all @ ¢ w) and power 
approximately 1 — 6 for @¢ w . Application of this condition to problems of comparing two 
populations with Koopman-form distributions also gives tests which include those given 
by Girshick (‘‘Contributions to the Theory of Sequential Analysis. I,’’ Annals of Math. 
Stat., Vol. 17 (1946), pp. 123-143), and some new tests for comparing variances of two 
normal populations. Tests of equality of ratios of means of two pairs of Poisson popu- 
lations are given. 


26. Sequential Decision Problems in the Stationary Case. J. Kierer, Cornell 
University. 


Results of Wald and Wolfowitz (‘‘Bayes’ solutions of sequential decision problems,” 
Ann. Math. Stat., Vol. 21 (1950), pp. 82-99; also, Chap. 4 of Wald’s Statistical Decision 
Functions, John Wiley and Sons, 1950) are generalized to the case where the chance variables 
are no longer assumed independent, but instead form a stationary process. Questions of 
measurability and existence, recurrence formulas, characterizations of Bayes’ solutions, 
ete., are simplified by first considering only nonrandomized decision functions and by then 
using results of the same authors (‘‘Two methods of randomization in statistics and the 
theory of games,’’ Ann. Math., Vol. 53 (1951), pp. 581-586) to extend the conclusions to 
randomized procedures. The essential difference between the independent case and, e.., 
the stationary Markoff case, is that in the latter a Bayes’ solution may depend at each 
stage on the last observation as well as on the a posteriori distribution. For example, a 
Bayes’ solution for testing between two simple hypotheses in the Markoff case is character- 
ized by two functions B(z) S A(z) (which under slight restrictions are continuous) which 
are used after m observations the last of which is z», by comparing the probability ratio 
to B(z,,) and A(z). Unlike the independent case, the B(z) and A(z) cannot in general be 
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replaced by constants independent of z; nor does every pair B(x), A(z) constitute a Bayes’ 
solution relative to some weight function, cost, and a priori distribution (as does every 


pair B, A in the independent case) ; nor need a Bayes’ solution possess the optimum property 
of the independent case. 


27. Random Functions Satisfying Certain Linear Relations. Il. Supnisn G. 
GuuryE, University of North Carolina. 


The particular case k = 1 of the problem mentioned in Part 1 is considered here in detail. 
Let X(t) be a p-dimensional, real-valued random function, defined and continuous in 
probability for all ¢ in an interval [t, , 7]. Further, let there exist a real-valued, p X p 
matric function A(h), defined and continuous for h > 0, such that if we write Y(k; h) = 
X(to + kh) — A(h)X (to + [k — 1Jh), then for any positive h and any integer n (nh S T — to), 
X(to), Y(1; hk), --- , ¥(n; hk) are mutually independent. Then it is shown that A(h) can be 
written in the form e*, where B is a constant matrix, and that X*(t) = e~8'X(t) is a random 
function with independent increments (r.f.i.i.). It is also shown that if Z(t) is any p-di- 
mensional r.f.i.i. and A(t) is a p X p matric function, continuous and of bounded vari- 


t 

ation, then the integral / A(v) dZ(v) exists as the unique limit-in-distribution of the 
to 

sequence of approximating sums. From this, a one-to-one correspondence (in distribution) 

between the random functions X(t) mentioned above and the random functions 


t 
| et-»)B dZ(v) is established. 


to 


28. Optimal Designs for Estimating Parameters. (Preliminary Report.) HERMAN 
Cuernorf, Stanford University. 


The following is a generalization of a result of Elfving (see ‘“Optimum allocation in 
linear regression theory,’’ Annals of Math. Stat., Vol. 23 (1952), pp. 255-262). It is desired 
to estimate parameters 0; , 62, --- , 6, . There is available a set of experiments which may 
be performed. The probability distribution of the data obtained from any of these experi- 
men‘s may depend not only on 6; , 62, --- , 6, but also on the nuisance parameters 6,4: , 
6542, °°: , 0% . One is permitted to select a design consisting of n of these experiments to be 
performed independently. The repetition of experiments is permitted in the design. Then 
it can be shown that under mild conditions and for large n locally optimal designs may be 
approximated by selecting a set of r Sk + (kK —1) + --- + (kK — 8 +1) of the experiments 
available and by repeating each of these r experiments in certain specified proportions. 
The criterion of optimality used is a natural one involving the information matrices of the 
experiments. 


29. The Distribution of the nth Variate in Certain Chains of Serially Dependent 
Populations. L. V. ToraLBaLLa, Marquette University. 


The following is a representative of the problems considered: Let ?; , P2,--- , P, bea 
sequence of normally distributed populations, the first having a mean m, and variance oj , 
each population after the first having a mean m; = az;_,; + b, where z;_; is a random value 
of the variate in P;_; and a variance o3 . One seeks the absolute distribution of the variate 
in P, . In this particular case it is found that the absolute distribution of the variate in P, 
is normal, with a mean bD?~*a* + a*—!m, and variance D°~}c*o2_; . 
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30. An Experimental Method for Obtaining Random Digits and Permutations. 
J. E. Wats, U.S. Naval Ordnance Test Station, China Lake. 


This paper presents an easily applied method for obtaining small numbers of random 
binary digits and random permutations. The procedure consists in flipping ordinary minted 
coins and combining the results of the flips in an appropriate manner. Digits and permuta- 
tions obtained according to the method of this paper can be considered sufficiently random 
for any practical application. It appears likely that these digits and permutations are much 
more nearly random than most of those now available in printed tables. Moreover, any 
possibility of bias from misuse of tables is avoided. The method presented is particularly 
suitable for use with respect to experimental designs. Only a few random permutations are 
ordinarily required for a given experimental design. 


31. Distribution of Canonical Partial Correlations. S. N. Roy, University of 
North Carolina. 


By certain general arguments the distribution of canonical partial correlations in random 
samples of size n + 1 from a (p + g +r) variate normal population (p S$ q¢,p +q+rsn) 
can be shown to be of the same form as that of canonical correlations in random samples 
of size n + 1 — r, and involves as parameters (on the non-null hypotheses) the p roots (all 
lying between 0 and 1) of the equation in @. 


' ie! s-1 tel \n , —te! 
| O(2u. — Vila Lie) — (Lie — Vudyzy Zss)(Z22 — Vers Lss)~ (Lie — LesFaa'Zis) | = O, 


where the population co-variance matrix = (supposed to be p.d.) is partitioned in the same 
manner as the sample covariance matrix S of Abstract 17. 
—— 
In the abstract ‘“‘On judging all contrasts in the analysis of variance’ by Henry Scheffé 


(Annals of Math. Stat., Vol. 23 (1952), p. 477) the equation a c; = 0 was printed incorrectly 
(due to a compositor’s error) as E* ¢; 6; = 0 on line 5. 


SR 
NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of interest 
Personal Items 


Mr. Fred C. Andrews has been appointed a Research Associate in the Applied 
Mathematics and Statistics Laboratory, Stanford University, Stanford, Cal- 
ifornia. 

Edward W. Barankin, Assistant Professor at the Statistical Laboratory, Uni- 
versity of California, Berkeley, has been promoted to Associate Professor. For 
the academic year 1952/53, Dr. Barankin will be on leave, working at the Insti- 
tute for Numerical Analysis, Los Angeles. 

Z. W. Birnbaum, who has been on leave from the University of Washington 
for the academic year 1951-1952 and had a visiting professorship in the Depart- 
ment of Statistics at Stanford University, has returned to resume his duties at 
the University of Washington. 

Dr. K. A. Bush, formerly Associate Professor of Mathematics at State Uni- 
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versity of New York, Champlain College, has accepted an appointment as 
Assistant Professor of Mathematical Statistics at the University of Illinois. 

Charles W. Dunnett, who has been granted a leave as Biometrician for the 
Food and Drug Laboratories, Ottawa, Canada, has joined the Department of 
Mathematics, Cornell University for the academic year 1952-3. 

Franklin Graybill has recently received his Ph.D. degree from Iowa State 
College and accepted a position at Oklahoma Agriculture and Mechanical College 
as Assistant Professor of Mathematics and Associated Statistician to the Agri- 
culture Experiment Station and Research’ Foundation. 

John Gurland, formerly with the Cowles Commission and the Committee on 
Statistics at the University of Chicago, has joined the staff of the Statistical 
Laboratory at Iowa State College in Ames. 

Wayne W. Gutzman is on leave of absence from the University of South 
Dakota since being recalled for active duty in the United States Navy and is 
a Lieutenant Commander and acting Officer-in-Charge of the Computation 
Ballistics Department of the Naval Proving Ground, Dahlgren, Virginia. 

Harman L. Harter, for the past three years an Assistant Professor of Math- 
ematics at Michigan State College, has accepted a position as a Mathematical 
Statistician at Wright Air Development Center, Wright-Patterson Air Force 
Base, Dayton, Ohio. 

Harry M. Hughes, Instructor at the Statistical Laboratory, University of 
California, Berkeley, has been promoted to Assistant Professor. 

T. J. Jaramillo, formerly Actuary of the Philippine-American Life Insurance 
Company of Manila, has been appointed as Senior Scientist of the Division 
of Engineering Mechanics Research of the Armour Research Foundation of the 
Illinois Institute of Technology, Chicago. 

Robert H. Matthias has accepted a position in the Research Division, Electro- 
chemicals Department of E. I. duPont de Neman Co., Niagara Falls. 

Lincoln E. Moses has accepted a joint appointment at Stanford University, 
California, as Assistant Professor in the Department of Statistics and Assistant 
Professor of Public Health and Preventive Medicine in the Medical School. 

Bruce D. Mudgett is now Emeritus Professor of Economics of the University 
of Minnesota and is residing at Thetford, Vermont. 

T. Ellison Neal, formerly statistician for the Textile Division of the U. S. 
Rubber Company, has joined the staff of the Research Department of the newly 
formed Chemstrand Corporation at Decatur, Alabama. 

B. E. Phillips, since the termination of the research phase of the Parsons- 
Aerojet Company work at Air Force Missile Test Center, Cocoa, Florida, has 
been transferred to the Facilities Operation Division of the Ralph M. Parsons 
Company, Frederick, Maryland. 

John Schmid, Jr., resigned from the Board of Examiners of Michigan State 
College to accept a position as research psychologist with the Research Services 
Division of the Human Resources Research Center at Lackland Air Force Base, 
San Antonio, Texas. 
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Benjamin J. Tepping during the academic year 1952-53 will be on leave as 
statistician in the Bureau of the Census and will be Visiting Lecturer in the 
Department of Mathematics and Research Associate in the Survey Research 
Center, University of Michigan. 

Dr. Milton E. Terry has resigned his position as Associate Professor of Statis- 
tics at the Virginia Polytechnic Institute and has accepted an appointment as 
Statistician at the Bell Telephone Laboratory, Murry Hill, New Jersey. 

Chia Kuei Tsao has accepted a position as instructor in the Department of 
Mathematics, Wayne University. 

Dr. John E. Walsh, formerly a Consultant with the Census Bureau, is now 
with the Central Evaluation Group—Code 0110, U. 8. Naval Ordnance Test 
Station, Inyokern, China Lake, California. 

Lowell A. Woodbury will be employed for the next two years by the Atomic 
Bomb Casualty Commission at Hiroshima, Japan, as Chief Biostatistician. 


Sidney B. Clark 


Sidney B. Clark, member of the Institute for six years, died of a heart attack 
in Washington, D. C., at the age of 33. He received his B. A. degree at George 
Washington University. At the time of his death Mr. Clark was a statistician in 
the National Production Authority. He had been a statistical consultant in the 
Bureau of Agricultural Economics of the Department of Agriculture and had 
been employed earlier in other government agencies. 


a 


The Educational Testing Service is offering for 1953-54 its sixth series of 
research fellowships in psychometrics leading to the Ph.D. degree at Princeton 
University. Open to men who are acceptable to the Graduate School of the 
University, the two fellowships each carry a stipend of $2,500 a year and are 
normally renewable. Fellows will be engaged in part-time research in the general 
area of psychological measurement at the offices of the Educational Testing 
Service and will, in addition, carry a normal program of studies in the Graduate 
School. Competence in mathematics and psychology is a prerequisite for obtain- 
ing these fellowships. The closing date for completing applications is January 
16, 1953. Information and application blanks may be obtained from: Director 
of Psychometric Fellowship Program, Educational Testing Service, 20 Nassau 
Street, Princeton, New Jersey. 


a 


Three $4000 post-doctoral fellowships in statistics are offered for 1953-54 by 
the University of Chicago. The purpose of these fellowships, which are open to 
holders of the doctor’s degree or its equivalent in research accomplishment, is to 
acquaint established research workers in the biological, physical, and social 
sciences with the crucial role of modern statistical analysis in the planning of 
experiments and other investigative programs and in the analysis of empirical 
data. The development of the field of statistics has been so rapid that most cur- 
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rent research falls far short of attainable standards, and these fellowships (which 
represent the third year of a five-year program supported by The Rockefeller 
Foundation) are intended to help reduce the lag by giving statistical training 
to scientists whose primary interests are in substantive fields rather than in 
statistics itself. The closing date for applications is February 1, 1953; instruc- 
tions for applying may be obtained from the Committee on Statistics, University 
of Chicago, Chicago 37, Illinois. 


rr 


New Members 


The following persons have been elected to membership in the Institute 
(May 31, 1952 to August 29, 1952) 


Boggs, Arthur B., M.A. (University of Michigan), Graduate Student, Department of Math- 
ematics, Michigan State College, 136 Albert Avenue, East Lansing, Michigan. 

Borden, Nathan B., M.S. (University of Michigan), Graduate Student, University of Mich- 
igan, 1925 Winston Avenue, Louisville 5, Kentucky. 

Broderick, Timonthy S., M.A. (Trinity College, Dublin), Professor of Mathematics, Trinity 
College, Dublin, St. Kevin’s, Sorrento Road, Dalkey, Co. Dublin, Ireland. 

Eckler, A. Ross, M.A. (Princeton), Graduate Student and Assistant, Mathematics Depart- 
ment, Princeton University, 225-C King Street, Princeton, New Jersey. 

Herd, G. Ronald, M.A. (Kansas University), Chief Statistician and Supervisor of Statistics 
Department, Aeronautical Radio, Inc., Military Tube Project, 469 Hampton Court, 
Tyler Gardens, Falls Church, Virginia. 

Hossain, Khondkar Manwar, M.A. (Dacca University, Pakistan), Research Student in 
Statistics, London School of Economics and Political Science, Pigeon Hole ‘‘H’’, Re- 
search Common Room, London School of Economics and Political Science, Houghton 
Street, Aldwych, W.C. 2, England. 

Khamis, Salem H., Ph.D. (University of London), Statistician, United Nations, New York, 
Statistical Office, United Nations, P.O.B. 20, Grand Central Post Office, New York, N.Y. 

LeCam, Lucien M., Ph.D. (University of California), Instructor, University of California, 
Department of Mathematics, Statistical Laboratory, University of California, Berkeley 
4, California. 

Morris, Leo E., M.S. (University of Washington, Seattle), Analytical Statistician, Quality 
Evaluation Laboratory, U.S. Naval Ammunition Depot, Bangor, Washington, 2013 
Parkside Drive, Bremerton, Washington. 

Parzen, Emanuel, M.A. (University of California), Research Assistant in Mathematics, 
University of California, Department of Mathematics, University of California, Berkeley 
4, California. 

Ransom, William E., B.S. (University of Illinois), Statistician, Evaluation and Quality 
Control Branch, Ordnance Ammunition Center, Joliet, Illinois, 1606 New Lenox Road, 
Joliet, Illinois. 

Rowan, Michael B., A.B. (George Washington University), Graduate Student, 358 N. 
Washington Street, Falls Church, Virginia. 

Rutledge, Robert W., B.S. (Sydney), Head Chemist, Pyrmont Distillery, c/o The Colonial 
Sugar Refining Company, Ltd., Pyrmont Distillery, Pyrmont, New South Wales. 

Sarhan, Ahmed E. E., B.S. (Fonad I University, Cairo), Statistician, Medical Research 
Laboratories, Cairo, 1 Clydesdale Rd., Hoylake, Cheshire, England. 
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Storer, Robert L., B.S. (University of Nebraska), Statistician, Chief, Evaluation and 
Quality Control Branch, Ordnance Ammunition Center, 718 John St., Joliet, Illinois. 

Wormleighton, Ralph, M.A. (Toronto), Defence Research Service Officer, C.A.R.D.E., 
P.O. Box 1427, Quebec, P.Q., Canada. 

Zindler, Hans-Joachim, Diplom-Mathematiker (University of Géttingen), Scientific 
Assistant, State Office for Automotive Vehicles, (24 6) Flensburg, Jurgensgaarderstr. 
57 British Zone, Germany. 


or 


REPORT OF THE EAST LANSING MEETING OF THE INSTITUTE 


The fifty-third meeting and the fourteenth summer meeting of the Institute 
of Mathematical Statistics was held in East Lansing, Michigan, at Michigan 
State CoJege on September 2-5, 1952. The meeting was held in conjunction 
with meetings of the Mathematical Association of America, the American Math- 
ematical Society, the Econometric Society, and the Pi Mu Epsilon fraternity. 
Two sessions were co-sponsored by the Econometric Society, and three sessions 
were co-sponsored by the American Meteorological Society and by the Ameri- 
can Geophysical Union. The following 119 members of the Institute attended: 


O. P. Aggarwal, G. E. Albert, C. B. Allendoerfer, R. L. Anderson, T. W. Anderson, K. J. 
Arnold, J. L. Bagg, W. D. Baten, R. E. Bechhofer, M. H. Belz, T. A. Bickerstaff, Allan 
Birnbaum, David Blackwell, C. R. Blyth, R. C. Bose, G. W. Brier, J. C. Brixey, G. W. 
Brown, J. H. Bushey, Enrique Cansado, Osmer Carpenter, R. H. Cole, T. F. Cope, A. H. 
Copeland, E. L. Cox, J. W. Coy, C. C. Craig, D. A. Darling, W. J. Dixon, J. L. Doob, P. 8. 
Dwyer, Benjamin Epstein, H. P. Evans, Evelyn Fix, E. A. Fosler, J. S. Frame, D. A. 8. 
Fraser, Bernard Friedman, J. E. Garrett, H. M. Gehman, M. A. Girshick, R. K. Haddad, 
S. T. Hadden, P. C. Hammer, M. H. Hansen, T. E. Harris, M. H. Henry, G. R. Herd, 
Clifford Hildreth, J. L. Hodges, Jr., Wassily Hoeffding, R. G. Hoffman, R. V. Hogg, Jr., 
W. C. Hood, Harold Hotelling, H. 8S. Houthakker, C. C. Hurd, P. K. Ito, W. W. Jacobs, 
T. A. Jeeves, W. H. Jones, Leo Katz, 8. H. Khamis, W. M. Kincaid, L. A. Knowler, T. C. 
Koopmans, William Kruskal, H. G. Landau, L. M. LeCam, E. L. Lehmann, G. J. Lieber- 
man, G. F. Lunger, H. B. Mann, F. J. Massey, Jr., J.W. Mauchly, K. O. May, D. M. Mesner, 
Robert Mirsky, F. C. Mosteller, C. J. Nesbitt, Jerzy Neyman, M. L. Norden, E. G. Olds, 
Ingram Olkin, Richard Otter, James Pachares, Emanuel Parzen, G. B. Price, Mina Rees, 
P. R. Rider, F. D. Rigby, D. D. Rippe, Murray Rosenblatt, 8. N. Roy, Herman Rubin, 
David Rubinstein, R. I. Savage, Henry Scheffé, E. D. Schell, Elizabeth Scott, Esther 
Seiden, L. D. Simmons, W. B. Simpson, Rosedith Sitgreaves, Arthur Stein, C. M. Stein, L. 
M. Steinberg, Zenon Szatrowski, W. F. Taylor, H.C.S. Thom, F. H. Tingey, Leo Tornqvist, 
C. K. Tsao, A. W. Tucker, J. W. Tukey, D. F. Votaw, Jr., Allen Wallis, J. Ernest Wilkins, 
Jr., B. J. Winer, M. A. Woodbury. 


The meeting opened on Tuesday, September 2, at 10:00 A.M. with a session 
on Stochastic Phenomena in Medicine. The chairman was Professor Frederick 
Mosteller, Harvard University, and the following papers were given: 


1. Some Stochastic Procedures Applied to Data on Tuberculin and Histoplasmin Skin Tests. 
William F. Taylor, School of Aviation Medicine, Randolph Field. 
2. Further Contributions to the Theory of Contagion. Grace E. Bates, Mount Holyoke 


College and University of California. (Read by Evelyn Fix, University of California.) 
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3. An Application of Markov Processes to Problems of Incidenceand Epidemiology of Mental 
Disease. Andrew Marshall and Herbert Goldhamer, Rand Corporation. (Presented, 
in outline, by T. E. Harris, Rand Corporation.) 


At 2:00 P.M. on the same day a session on Recent Developments in Measure- 
ment in the Social Sciences was held with Professor Leo Katz, Michigan State 
College, as chairman. The following papers were given: 


1. Testing Organization Theories. M. M. Flood, Rand Corporation. 

2. Asymptotic Distributions of Estimates for Latent Structure Analysis. T. W. Anderson, 
Columbia University. 

3. Problem of Social Choice and Individual Values. Leo A. Goodman, University of 
Chicago. 


Discussion was by Professor John W. Tukey, Princeton University, and Pro- 
fessor Max A. Woodbury, University of Pennsylvania. 

A session of contributed papers was held at 10:15 A.M. on Wednesday, Sep- 
tember 3, with Professor Leo A. Goodman, University of Chicago, as chairman. 
The following papers were delivered: 


1. An Extension of Massey’s Distribution of the Maximum Deviation between Two Sample 
Cumulative Step Functions. Preliminary Report. Chia Kuei Tsao, Wayne University. 

2. Polynomial Correlation Coefficients. W. D. Baten and J. S. Frame, Michigan State 
College. 

3. Truncated Poisson Distributions. Paul R. Rider, Wright-Patterson Air Force Base 
and Washington University. 

4. Frequency Distributions for Functions of Rectangularly Distributed Random Variables. 
Stuart T. Hadden, Socony-Vacuum Laboratories, Paulsboro, New Jersey. 

5. On Truncated Rules of Action. Preliminary Report. Benjamin Epstein, Wayne Univers- 
ity. 

6. The Distribution of the Difference of Two Independent Chi-Squares. James Pachares, 
University of North Carolina. 

7. Partially Balanced Designs with Two Plots Per Block. R. C. Bose, University of North 
Carolina, and K. R. Nair, University of North Carolina and Forest Research Insti- 
tute, Dehradun, India. 

8. Minimax Sampling and Estimation in Finite Populations. Om Prakash Aggarwal, 
Stanford University. 

9. Some Two-Sample Tests on the Exponential Distribution. Preliminary Report. (By 
title.) Benjamin Epstein and Chia Kuei Tsao, Wayne University 

10. Efficiency of Estimators of the Mean of an Exponential Distribution Based Only on the 
rth Smallest Observation in an Ordered Sample. (By title.) Benjamin Epstein, Wayne 
University. 

11. On the Theory of Systematic Sampling. III. (By title.) William G. Madow, University 
of Illinois. ; 

12. The Power of Some Service Tests. (By title.) Leo A. Goodman, University of Chicago. 


A Special Invited Paper was given by Professor Harold Hotelling, University 
of North Carolina, at 2:00 P.M. on September 3. The title of Professor Hotel- 
ling’s address was Distribution of Quadratic Forms. Professor P. 8. Dwyer, Uni- 
versity of Michigan, presided. 

A session on Recent Developments in Estimation and Hypothesis Testing in the 
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Nonparametric Case was held at 3:00 P.M. on September 3 with Professor W. J. 
Dixon, University of Oregon, as chairman. The following papers were presented: 


1. The Power of Nonparametric Tests. Erich L. Lehmann, University of California, 
Berkeley. 

2. The Power of Certain Nonparametric Tests. Wassily Hoeffding, University of North 
Carolina. 

3. Nonparametric Theory: Confidence Regions and Tests for Location and Scale Parameters. 
D. A. 8. Fraser, University of Toronto. 


Discussion was by Professor J. W. Tukey, Princeton University, and Mr. I. R. 
Savage, National Bureau of Standards. 

The three sessions of Thursday, September 4, dealt with problems of Cloud 
Seeding and were co-sponsored by the American Meteorological Society and 
the American Geophysical Union. The morning session, held at 9:30 A.M., was 
entitled Cloud Seeding: The Problem and had as its chairman Professor Harold 
Hotelling, University of North Carolina. The following papers were given: 


1. Cloud Seeding; a Problem of National Importance. R. R. Reynolds, Division of Water 
Resources, Sacramento. 

2. Physical Basis of Cloud Seeding. H.G. Houghton, Department of Meteorology, Mass- 
achusetts Institute of Technology. 

3. The Physics of Cloud Seeding and the Results of Laboratory Experiments. Vincent J. 
Schaefer, General Electric Company, Schenectady. 


The afternoon session began at 2:00 P.M. and was a Review of Already Published 
Evaluations of Cloud Seeding Experiments. The chairman was Dr. Howard T. 
Orville, Bendix Aviation Corporation, and the following papers were given: 


1. Evaluation of the Bishop Creek, California, Cloud Seeding Tests. Ferguson Hall, Scien- 
tific Services Division, Weather Bureau. 

2. Methods of Evaluating the Effects of Periodic Silver Iodide Seeding. Irving Langmuir, 
General Electric Company, Schenectady. 

3. Progress of Cloud Seeding Analysis in Oregon. Robert T. Beaumont, Bureau of Soil 
Conservation, Medford, Oregon. 

4. Some Pitfalls Encountered in Certain Current Methods of Evaluation. T. A. Jeeves, 
L. LeCam, E. L. Scott, University of California, Berkeley. 

5. An Approach to the Evaluation of Results of Rainmaking, A Progress Report. C. E. 
Buell, University of New Mexico. 


In the evening, at 7:00 P.M., a session was held with the title Proposed Statistical 
Methodology and Round Table Discussion. The chairman was Professor Henry 
Scheffé, Columbia University, and the following papers were given: 


1. On Proposed Statistical Methodology. J. Neyman, University of California, Berkeley. 

2. Methods of Evaluating Cloud Seeding Operations. Herbert C.S. Thom, Weather Bureau 
and Cornell University. 

3. Statistical Problems Encountered in the Analysis of Cloud Seeding Experiments. Glenn 
W. Brier, Statistical Section, Weather Bureau. 


Each of the three Cloud Seeding sessions was followed by a discussion. 


) 
. 
. 
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On Friday, September 5, at 10:00 A.M. a second session of contributed papers 
was held with Professor C. C. Craig, University of Michigan, presiding. The 
following papers were presented: 


1. A Minimal Essentially Complete Class of Tests of a Simple Hypothesis Specifying the 
Mean of a Unit Rectangular Distribution. Allan Birnbaum, Columbia University. 

2. Application of Random Walk Theory toa General Class of Sequential Decision Problems. 
Preliminary Report. G. E. Albert, University of Tennessee. 

3. Nonparametric Comparisons of Populations When Data Are Collected in Homogeneous 
Groups. Frank J. Massey, Jr., University of Oregon. 

4. On the Reduced Moment Problem. Salem H. Khamis, Statistical Office, United Nations. 
5. Canonical Partial Correlations. 8. N. Roy and J. Whittlesey, University of North 
Carolina. 

6. A Useful Transformation in the Case of Canonical Partial Correlations. 8. N. Roy, 
University of North Carolina. 

7. Uniform Convergence of Distribution Functions. Emanuel Parzen, University of 
California, Berkeley. 

8. Statistical Aspects of a Linear Programming Problem. D. F. Votaw, Jr., Yale Uni- 
versity. 

9. Mazrimum Likelihood Estimators and A Posteriori Distributions. (By title.) J. Wolfo- 
witz, Cornell University. 

10. Estimates and Asymptotic Distributions of Certain Statistics in Information Theory. 
Preliminary Report. (By . tle.) John P. Hoyt, U. 8. Naval Academy. Introduced by 
8S. Kullback. 

11. On Testing One Simple Hypothesis against Another. (By title.) Lionel Weiss, Uni- 
versity of Virginia. 

12. Extreme Value Theory for m-dependent Stationary Sequences of Continuous Random 
Variables. (By title.) Geof. Watson, University of Melbourne. 

13. Sequential Tests and Estimates for Comparing Poisson Populations. (By title.) Allan 
Birnbaum, Columbia University. 

14. Sequential Decision Problems in the Stationary Case. (By title.) J. Kiefer, Cornell 
University. 

15. Random Functions Satisfying Certain Linear Relations. II. (By title.) Sudhish G. 
Ghurye, University of North Carolina. 

16. Optimal Designs for Estimating Parameters. Preliminary Report. (By title.) Herman 
Chernoff, Stanford University. 

17. The Distribution of the nth Variate in Certain Chains of Serially Dependent Populations. 
(By title.) L. V. Toralballa, Marquette University. Introduced by Joseph Talacko. 

18. An Experimental Method for Obtaining Random Digits and Permutations. (By title.) 
J. E. Walsh, U. 8. Naval Ordnance Test Station, China Lake. 

19. Distribution of Canonical Partial Correlations. (By title.) 5. N. Roy, University of 
North Carolina. 


At 2:00 P.M. on Friday, September 5, a session on the Comparison of Experi- 
ments was held with the co-sponsorship of the Econometric Society. The chair- 
man was Professor M. A. Girshick, Stanford University, and the following papers 
were presented: 


1. Equivalent Methods of Comparison. David H. Blackwell, Howard University. 
2. Approximate Comparison of Experiments. Charles Stein, University of Chicago. 


At 3:00 P.M. on Friday, September 5, the final session of the meeting was 
held. This session was one of invited papers, and its chairman was Professor J. 
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Neyman, University of California, Berkeley. The invited addresses and dis- 
cussions were the following: 


1. Estimates of Bounded Relative Error in Particle Counting. (Based on joint work of 
M. A. Girshick, H. Rubin, and Rosedith Sitgreaves.) Rosedith Sitgreaves, Stanford 
University. 

2. Multiple Comparison Procedures in the Analysis of Variance. Henry Scheffé, Columbia 
University. 

Discussion: Robert Bechhofer, Cornell University, and J. W. Tukey, Princeton 
University. 

3. Topics in Stationary Time Series. (Based on joint work with Murray Rosenblatt.) 

Ulf Grenander, University of Stockholm and University of Chicago. 
Discussion: Murray Rosenblatt, University of Chicago. 


The Council met at 8:00 P.M. on Tuesday, September 2, and a Business 
Meeting was held at 9:00 A.M. on Wednesday, September 3. At both of these 
meetings Professor M. A. Girshick presided. A banquet was held on the evening 
of September 3, and an I.M.S. party on the evening of September 4. 

WituiaM KruskKau 
Associate Secretary 
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